Comment by HenriNext

6 years ago

Neither the blog nor the repo contain benchmarks, which is bit weird when the title of a technical article has word "Faster".

Then after digging through the actual paper it turns out that the construction performance of Xor Filter is actually massively slower than Bloom.. May not matter for many uses, but critical for others (like mine).

> Blocked Bloom = 10 ns/key

> Bloom 8 = 40 ns/key

> Xor 8 = 110 ns/key

Co-author of the paper here. The repo https://github.com/FastFilter/fastfilter_cpp contains all the source code and benchmarks. Yes, construction of the xor filter is slower, but I wouldn't say massively. You should probably compare Bloom 12 against Xor 8, as those have similar false positive rates. Bloom 12 is 60 ns/key, versus Xor 8 at 110 ns/key. Cuckoo filter is somewhere between. Those number don't include calculating the hash code of a key, which might be another 30 ns/key or so.

What is your use case? Maybe Blocked Bloom might be better (if you have the memory).

  • Thanks, found the benchmarks (your blog links primarily to different repo, "/xorfilter").

    My use case is a new kind of debugger (https://BugJail.com), which works by capturing 20+ million events per second from running application, and then reconstructing model of program execution from the raw events. With 20+ million events per second that 110ns per key is just not going to work..

    • Very interesting! If both construction and lookup need to be very fast, and if you do have enough memory, then I would say a blocked Bloom filter would be best (without knowing your use case in detail). With a regular Bloom filter, you will have lots of CPU cache misses. Cuckoo and xor filter might make sense if you can do (batch) construction in background threads, multiple filters concurrently. That way you can save memory.

      1 reply →

    • That is a very interesting project that you have on the go there, it doesn't happen often that I see tools that I think might be game changers coming along and this just might be one of those.

      3 replies →

  • Yes but grandparent is correct. The first word in the title is still “faster”.

    • I know it's frowned upon to say this at HN, but I'm beginning to wonder who's reading the actual page. At least to me the page seems pretty clear about it:

      >The xor filters takes a bit longer to build, but once built, it uses less memory and is about 25% faster in some demonstration test.

      When they say "faster," they mean for querying. Which is actually pretty useful for a lot of use cases where bloom filters are used today.

    • To be clear, I wrote specifically "construction performance".

      The blog already stated that "build is bit slower", so i was mainly commenting that instead of "bit slower" it is actually order of magnitude slower (in some cases).

      According to the paper, the query side performance is indeed faster, so they are not misleading in any way.