Comment by adamzwasserman

3 months ago

Exactly. A 1.2% false positive rate means unnecessary reads 1.2% of the time vs 100% without the filter. Even at 10% FP rate, you skip 90% of I/O.

This asymmetry works great for I/O-bound workloads (skip-indexes) but fails for TFA's approach where every document needs its own filter.

In practice, you combine both: inverted index for the dictionary (amortizes across documents), then bloom filters per chunk of the index (amortizes across chunks). This two-level approach handles scale much better than TFA's one-filter-per-document design. It's bloom filters as an optimization layer, not a replacement.

0 comments

adamzwasserman

No comments yet

Contribute on Hacker News ↗