Comment by nullc

6 years ago

The overhead you're seeing is because you've mismatched your FP rate with the optimal size for your GCS code. If they're exactly matched the overhead is very low (not as low as an arithmetic code simply because each entry is still coded into bits, so you get something like a half-bit overhead on average from that).

5 comments

nullc

thomasmg 6 years ago

I would love to see improvements to the GCS code I have, but I tested with many different sizes and Golomb-Rice parameters... I was not able to reduce the size overhead in any way. Maybe you can help? There are two implementations, one is Java: https://github.com/FastFilter/fastfilter_java/blob/master/sr... and the other is C++: https://github.com/FastFilter/fastfilter_cpp/blob/master/src...

FreakLegion 6 years ago
Good point about Q-1. I never looked at optimizing parameters when writing up the GCM.
A quick empirical check shows a small improvement for Q-1, but the claim that it comes in under 1% from the minimum seems to be based on an unusual definition of the minimum. nullc gives it as log2(eM), but the real minimum is simply log2(M), in which case the unary terminator alone imposes a 5% penalty at M=2^20.
It may still be possible to optimize the parameters further, but the gain would be perhaps a tenth of a bit per item.
- nullc 6 years ago
  
  > but the real minimum is simply log2(M)
  The minimum entropy to code such a thing is log2(MN choose N)/N bits per element.
  log2(eM) is just an approximation when N is large.
  You cannot code with N uniform choices out of MN options with less than log2(MN choose N) bits on average, by the pigeonhole principle.
  > the unary terminator alone
  The terminator carries information. If M is chosen well the terminator carries almost one bit of information.
  
  2 replies →