Comment by stebalien
1 day ago
I've used LEB128 (with canonicalisation) extensively and... this looks so much nicer for most use-cases (length prefixed, supports the full uint64 range without that extra 10th byte).
The downside is the encoding size. LEB128 quickly grows to 2 bytes, but stays at 2 bytes all the way to 2^14. This is important if you're using these numbers as tags/identifiers as we were in the multicodec [1] project, or for network message lengths. bijou64 only gives you 500 <= 2 byte numbers.
> I’ve used LEB128 (with canonicalisation) extensively and... this looks so much nicer for most use-cases (length prefixed, supports the full uint64 range without that extra 10th byte)
If you only want to encode uint64 numbers LEB128 could easily be tweaked to fit in 9 bytes in several ways:
- using the offset trick described in this article would remove non-unique encodings (0x80 0x00 would encode 128)
- never allowing encodings longer than 9 bytes would mean the MSB of any ninth byte would always be zero, so you could reuse that, and store 8 bits in any ninth byte, for a total of 7 bits in each of the first eight bytes plus 8 in the ninth = 64
Both tweaks would lose LEB128’s property that you can find where each number starts from any byte in the stream, but the encoding discussed here doesn’t have that property either.
sup steb, this is expede's work!
Sup b5! I always look forward to new work by expede (and n0).