Comment by ubitaco
2 days ago
It's slightly buried in the readme on Github:
> how can we store a 24 byte long string, inline? Don't we also need to store the length somewhere?
> To do this, we utilize the fact that the last byte of our string could only ever have a value in the range [0, 192). We know this because all strings in Rust are valid UTF-8, and the only valid byte pattern for the last byte of a UTF-8 character (and thus the possible last byte of a string) is 0b0XXXXXXX aka [0, 128) or 0b10XXXXXX aka [128, 192)
Any Unicode encoding would allow that.
UTF-32 has an entire spare byte to put flags into. 24 or 21 bit encodings have spare bits that could act as flags. UTF-16 has plenty of invalid code units, or you could use a high surrogate in the last 2 bytes as your flag.