← Back to context

Comment by oulipo2

17 hours ago

How do you use it for visualization? Do you use ClickStack? or something else?

There is HyperDX - search is not fastest, but it could be something that we do too - haven't checked deeply if high-cardinality is big issue with ClickHouse, but seeing some high cardinality data with what we post.

Still via Grafana. I ran it side-by-side with Loki and despite trying to optimise Loki and using ClickHouse out of the box - it really was shocking how much faster ClickHouse was for every single query (e.g. in the last 12 hours give my the frequency of logs with a particular JSON event or even "find this log entry, then join back and find the number of times a different entry appears within the same correlation_id)

  • What does the layout in click house look like? Do the input logs need to have a very defined structure?

    • Not really, ClickHouse is super forgiving so you can do something like:

          CREATE TABLE default.events (
            `timestamp` DateTime
            `event` String -- e.g. 'product.updated' or empty/null
            `message` -- human readable message
            `raw` -- the raw message - this is very useful when pushing logs that aren't JSON - you just let the `event` be null and dump the entire message here
          )
          ENGINE = MergeTree
          PARTITION BY toDate(timestamp)
          ORDER BY (timestamp, event)
          TTL timestamp + toIntervalMonth(6)
      

      ClickHouse is extremely performant even in the cases of e.g.: SELECT count(*) FROM `events` WHERE `raw` LIKE '%hello world%'

      Of course, the more columns you splat out (e.g. like correlation_id, user_id, order_id, etc) the better you can index and expect those queries to perform but in general I don't bother outside the obvious core domain ones (exampled above), the performance is so good that unindexed queries are significantly faster than indexed queries in Loki. I have reached the point where I JSON extract on-the-fly for the WHERE clause with very large queries with no meaningful performance issues.