Comment by Retr0id

1 year ago

It's both.

3 comments

Retr0id

UTF-8 is simply an encoding; "UTF-8 characters" is just not correct use of language. Just like, say, "binary number"; a number has the same value regardless of the base you use to write it, and the base is a scheme for representing it, not a system for defining what a number is. This is a common imprecision in language which I have seen cause serious difficulties in learning concepts properly.

Retr0id 1 year ago
"unicode codepoint sequences whose codepoint lengths and/or utf8-code-unit-lengths behave oddly when you change their case" would not fit in a HN title, however
- Rendello 1 year ago
  
  I (OP) said above that "Unicode codepoints that expand or contract when case is changed in UTF-8" would have worked fine, I've changed the Gist title to that in any case. I'm curious if it would've affected the attention it received on HN.