Comment by senderista
6 years ago
If I were coding a static approximate membership query structure, I wouldn't use either bloom filters or GCS, I'd use an Elias-Fano-coded sequence of fingerprints. That has nearly the same compression efficiency as Golomb coding but can be efficiently queried with no decompression.
Interesting idea! Elias-Fano monotone sequences unfortunately do need a bit more space than just a list of Rice codes. What is possible, without having to increase the space usage, is to re-arrange the Rice codes in each bucket, so that all variable parts come first, and the fixed parts at the end. That way, lookup speed would be much, much faster for large buckets. We used this for RecSplit: https://arxiv.org/abs/1910.06416 I will try that out for GCS as well!