Comment by twotwotwo

4 months ago

There is a largish category of tools now where, unlike in OLTP systems, a big focus is scanning data but quickly (O(n) but with a good constant): Redshift, Trino/Athena, ClickHouse, DuckDB among others.

Bloom filter indexing seems like a great fit if you ever need to do substring searches in a context like that, and for log searching in general. I haven't dug into what all packages have it, but it looks like at least ClickHouse does: https://clickhouse.com/docs/optimize/skipping-indexes#bloom-...

0 comments

twotwotwo

No comments yet

Contribute on Hacker News ↗