← Back to context

Comment by stebalien

1 day ago

I've used LEB128 (with canonicalisation) extensively and... this looks so much nicer for most use-cases (length prefixed, supports the full uint64 range without that extra 10th byte).

The downside is the encoding size. LEB128 quickly grows to 2 bytes, but stays at 2 bytes all the way to 2^14. This is important if you're using these numbers as tags/identifiers as we were in the multicodec [1] project, or for network message lengths. bijou64 only gives you 500 <= 2 byte numbers.

[1]: https://github.com/multiformats/multicodec

> I’ve used LEB128 (with canonicalisation) extensively and... this looks so much nicer for most use-cases (length prefixed, supports the full uint64 range without that extra 10th byte)

If you only want to encode uint64 numbers LEB128 could easily be tweaked to fit in 9 bytes in several ways:

- using the offset trick described in this article would remove non-unique encodings (0x80 0x00 would encode 128)

- never allowing encodings longer than 9 bytes would mean the MSB of any ninth byte would always be zero, so you could reuse that, and store 8 bits in any ninth byte, for a total of 7 bits in each of the first eight bytes plus 8 in the ninth = 64

Both tweaks would lose LEB128’s property that you can find where each number starts from any byte in the stream, but the encoding discussed here doesn’t have that property either.