Comment by kens

13 years ago

There's an invisible Unicode U+200E left-to-right mark at the end of the URL, probably picked up when the parent cut-and-paste the URL into HN. This in UTF-8 is E2 80 8E, which gets misinterpreted by the server as Windows-1252 character set: E2 = â, 80 = €, 8e = Ž. (It could be iso-8859-1, except that doesn't include €.) Interestingly, Chrome's DOM inspector shows this character as the HTML entity ‎ while view-source has it as the actual invisible character.

I think the poster of the URL originally mangled it, but it would be nice if the HN software filtered out invisible characters from URLs. There's not much the destination server can do about it.

(Yes, I've dealt with too many character set issues in the past.)