Comment by thomasmg

1 month ago

Yes, non-blocked is also possible. This would need a bit less space, and would be a bit slower. The counts > 1 (per bit that is set) are stored spearately, yes.

> This would need a bit less space, and would be a bit slower.

I guess that is because the count storage update is really slow, right, so it's better to have one than two (or whatever number of set bits) operations? At least the linked code seems to process it one by one bit when updating, and without some sort of “rank of n-th set bit” operation (which would accelerate the “select” version fairly well), I'm not sure it could be made much faster than that either.

Edit: I see https://github.com/FastFilter/fastfilter_java/blob/master/fa... tries to make that operation in O(1), but to be honest, with this many operations, the cure almost looks worse than the disease. :-) Haven't benchmarked, though.

  • Yes the count storage update is a bit slow. It could be implemented as a background thread, so that it is not blocking. It depends on the use case wheter this is a problem. The 'blocked' variant might be faster to optimize I assume, if this is needed.

    • > It could be implemented as a background thread, so that it is not blocking.

      How would you realistically do this without creating more overhead in the thread communication than what you're saving? I've never heard of offloading a 30–40 cycle operation to a thread before. Typically sending an atomic across CPUs is what, 200 cycles? (That's assuming you have the thread just sitting there in some kind of pool; firing up a new one is much, much more expensive. It also assumes you never need to hear back from it, so wouldn't work well for a decrease where you need to know the count.)

      3 replies →