← Back to context

Comment by kccqzy

2 months ago

"Visually identical" is never good enough. Have you heard of attacks confusing Latin letters and Cyrillic letters? For example C versus С. (The latter is known as CYRILLIC CAPITAL LETTER ES.) Have you heard of NFC forms versus NFD forms? For example é versus é (LATIN SMALL LETTER E + COMBINING ACUTE ACCENT versus LATIN SMALL LETTER E WITH ACUTE.)

Nothing that's important when it comes to security and privacy should rely on a "visually identical" check. Fortunately browsers these days are already good at this; their address bars use puny code for the domain and percent encoding for the rest of the URL.

As the sibling comment has mentioned Unicode in DNS uses a punycode encoding but even further then that the standard specifies that the Unicode data must be normalized to NFC[0] before being converted to punycode. This means that your second example (decomposed e with combining acute accent vs the composed variant) is not a valid concern. The Cyrillic one is however.

[0] https://www.rfc-editor.org/rfc/rfc5891 § 4.1 "By the time a string enters the IDNA registration process as described in this specification, it MUST be in Unicode and in Normalization Form C"

  • The OP said link. The NFC/NFD issue remains if these are part of a path name or query parameter.

    • Sure, but the security concerns of that I feel are much less concerning than having multiple domain names with the same visual appearance that point to different servers. That has immediate impact for things like phishing whereas lookalike path or query portions would at least ensure you are still connecting to the server that you think you are.

Erm, DNS uses Punycode because it comes from a time when Unicode didn't exist, and bind assumes a grapheme has no more than one byte.

  • Yes but I guess that the message was meaning that browsers now detect homographs and display the punycode instead. See also https://news.ycombinator.com/item?id=14130241; at that time Firefox wasn't fixed, but in the meantime it fixed the issue too (there's a network.idn.punycode_cyrillic_confusables preference, which is enabled by default).