Comment by layer8
4 hours ago
Unicode is the code points. Of course you have to normalize on one encoding for password hashing (and UTF-8 is the canonical choice for that, because interfaces to hash implementations are byte-based), but that’s not an issue of end-user input. The goal of Unicode was to be able to roundtrip the existing encodings through it, and it achieved that goal.
> That would have had no effect other than momentary user confusion. In that world, someone with a yen sign in their password would, after the keyboard update, have a backslash in their password, because their password never changed. Only the label changed.
No. The analogon to TFA would be that the old keyboard would have a Yen key and no backslash key, and the new keyboard would have no Yen key and still no backslash key. The point is that the Yen key would be removed because its character code is not part of the shared common subset of ASCII. ASCII doesn’t imply that you have a keyboard capable of entering all 128 codes. Just like Unicode doesn’t imply that your keyboard allows you to input arbitrary code points.
> No. The analogon to TFA would be that the old keyboard would have a Yen key and no backslash key, and the new keyboard would have no Yen key and still no backslash key. The point is that the Yen key would be removed because its character code is not part of the shared common subset of ASCII. ASCII doesn’t imply that you have a keyboard capable of entering all 128 codes.
Are you sure you read the article? The key is still there.
> Of course you have to normalize on one encoding for password hashing (and UTF-8 is the canonical choice for that, because interfaces to hash implementations are byte-based)
This is pure gibberish. All encodings produce bytes. UTF-8 has no relationship to the concept that isn't shared by every other encoding.