Comment by adamzwasserman
4 days ago
Exactly. A 1.2% false positive rate means unnecessary reads 1.2% of the time vs 100% without the filter. Even at 10% FP rate, you skip 90% of I/O.
This asymmetry works great for I/O-bound workloads (skip-indexes) but fails for TFA's approach where every document needs its own filter.
In practice, you combine both: inverted index for the dictionary (amortizes across documents), then bloom filters per chunk of the index (amortizes across chunks). This two-level approach handles scale much better than TFA's one-filter-per-document design. It's bloom filters as an optimization layer, not a replacement.
No comments yet
Contribute on Hacker News ↗