← Back to context

Comment by gpvos

9 hours ago

You could even extend UTF-8 to make 0xFE and 0xFF valid starting bytes, with 6 and 7 following bytes each, and get 42 bits of space. I seem to remember Perl allowed that for a while in its v-strings notation.

Edit: just tested this, Perl still allows this, but with an extra twist: v-notation goes up to 2^63-1. From 2^31 to 2^36-1 is encoded as FE + 6 bytes, and everything above that is encoded as FF + 12 bytes; the largest value it allows is v9223372036854775807, which is encoded as FF 80 87 BF BF BF BF BF BF BF BF BF BF. It probably doesn't allow that one extra bit because v-notation doesn't work with negative integers.