UTF-8 is simply an encoding; "UTF-8 characters" is just not correct use of language. Just like, say, "binary number"; a number has the same value regardless of the base you use to write it, and the base is a scheme for representing it, not a system for defining what a number is. This is a common imprecision in language which I have seen cause serious difficulties in learning concepts properly.
"unicode codepoint sequences whose codepoint lengths and/or utf8-code-unit-lengths behave oddly when you change their case" would not fit in a HN title, however
I (OP) said above that "Unicode codepoints that expand or contract when case is changed in UTF-8" would have worked fine, I've changed the Gist title to that in any case. I'm curious if it would've affected the attention it received on HN.
UTF-8 is simply an encoding; "UTF-8 characters" is just not correct use of language. Just like, say, "binary number"; a number has the same value regardless of the base you use to write it, and the base is a scheme for representing it, not a system for defining what a number is. This is a common imprecision in language which I have seen cause serious difficulties in learning concepts properly.
"unicode codepoint sequences whose codepoint lengths and/or utf8-code-unit-lengths behave oddly when you change their case" would not fit in a HN title, however
I (OP) said above that "Unicode codepoints that expand or contract when case is changed in UTF-8" would have worked fine, I've changed the Gist title to that in any case. I'm curious if it would've affected the attention it received on HN.