Comment by Sesse__

1 month ago

FWIW, I found https://github.com/FastFilter/fastfilter_java/issues/28 a pretty good explanation of what's going on in the succinct counting blocked Bloom filters. (I'm not sure if the blocking is needed for the normal Bloom filter part, though, i.e., all bits don't necessarily need to fall within the same block, no? But the counts are stored separately for each block.)

Yes, non-blocked is also possible. This would need a bit less space, and would be a bit slower. The counts > 1 (per bit that is set) are stored spearately, yes.

  • > This would need a bit less space, and would be a bit slower.

    I guess that is because the count storage update is really slow, right, so it's better to have one than two (or whatever number of set bits) operations? At least the linked code seems to process it one by one bit when updating, and without some sort of “rank of n-th set bit” operation (which would accelerate the “select” version fairly well), I'm not sure it could be made much faster than that either.

    Edit: I see https://github.com/FastFilter/fastfilter_java/blob/master/fa... tries to make that operation in O(1), but to be honest, with this many operations, the cure almost looks worse than the disease. :-) Haven't benchmarked, though.

    • Yes the count storage update is a bit slow. It could be implemented as a background thread, so that it is not blocking. It depends on the use case wheter this is a problem. The 'blocked' variant might be faster to optimize I assume, if this is needed.

      4 replies →