Comment by layer8
6 hours ago
Unicode was introduced to solve that very problem, and it largely does.
In the olden times, even ASCII wasn’t necessarily a safe bet, as many countries used their own slight variation of ASCII. For example, Japan had the Yen sign in place of the backslash. In a fictional ASCII world, Apple could have decided to remove the Yen key from the Japanese lockscreen keyboard.
> Unicode was introduced to solve that very problem, and it largely does.
What? Unicode doesn't address the problem at all. Your emoji password will look completely different depending on the encoding you use. We have multiple popular encodings right now... but instead of software that lets us specify which encoding we want to use to interpret a document, we have software that intentionally prohibits us from doing that because it's supposed to be a security risk.
UTF-8 wasn't introduced to solve the problem of there being multiple encodings of any given text, either. It was introduced to be another encoding.
> In a fictional ASCII world, Apple could have decided to remove the Yen key from the Japanese lockscreen keyboard.
That would have had no effect other than momentary user confusion. In that world, someone with a yen sign in their password would, after the keyboard update, have a backslash in their password, because their password never changed. Only the label changed.
In this world, though, it's still true that the password never changed. But what did change was that Apple implemented specific logic to prevent people from entering that password. The label didn't matter.
(And the article is ambiguous over whether the appearance of the keyboard changed or not. It's not ambiguous over whether the behavior of the keyboard changed -- it didn't:
>> Post-update, when entering the passcode, the keyboard now displays an identical accent mark in the háček's place, a feature Byrne described as "pointless; they're encoded the same."
There may or may not have been a cosmetic change to the keyboard, but there certainly was a change to the behavior of the password field.)
Unicode is the code points. Of course you have to normalize on one encoding for password hashing (and UTF-8 is the canonical choice for that, because interfaces to hash implementations are byte-based), but that’s not an issue of end-user input. The goal of Unicode was to be able to roundtrip the existing encodings through it, and it achieved that goal.
> That would have had no effect other than momentary user confusion. In that world, someone with a yen sign in their password would, after the keyboard update, have a backslash in their password, because their password never changed. Only the label changed.
No. The analogon to TFA would be that the old keyboard would have a Yen key and no backslash key, and the new keyboard would have no Yen key and still no backslash key. The point is that the Yen key would be removed because its character code is not part of the shared common subset of ASCII. ASCII doesn’t imply that you have a keyboard capable of entering all 128 codes. Just like Unicode doesn’t imply that your keyboard allows you to input arbitrary code points.
> No. The analogon to TFA would be that the old keyboard would have a Yen key and no backslash key, and the new keyboard would have no Yen key and still no backslash key. The point is that the Yen key would be removed because its character code is not part of the shared common subset of ASCII. ASCII doesn’t imply that you have a keyboard capable of entering all 128 codes.
Are you sure you read the article? The key is still there.
> Of course you have to normalize on one encoding for password hashing (and UTF-8 is the canonical choice for that, because interfaces to hash implementations are byte-based)
This is pure gibberish. All encodings produce bytes. UTF-8 has no relationship to the concept that isn't shared by every other encoding.