Could you explain what fixed/immune means? Is it only the confusable characters, ie characters that are visually identical or near-identical to latin characters, that is getting the punycode treatment?
Can a browser could track how many language/character sets are
typically used by a browser profile, and warn the user when they are
about to use a new, previously unused set, rather than waving the
duty off as the "responsibility of domain owners"?
With now over 1000 top-level domains, and however many homographic
matches among character sets, expecting people to register dozens of
matching domains seems unrealistic.
Won't it be even easier to just check if the domain contains something outside the currently used character set (perhaps always allowing ascii)?
I think that, plus a "you have never visited this site before" kind of warning could go a long way towards combating these kinds of attacks.
I think the real devil is going to be in the UI. You don't want to make it overly scary (otherwise you penalize domains which use some unicode characters correctly), but it can't be so unnoticable that you won't be able to tell when it matters.
Well if you're Russian (or one of the many other nationalities that uses a Cyrillic character set), then that's still not going to help you. If you visit аррІе.com (all Cyrillic characters) you wouldn't get any warning that it wasn't apple.com (all Latin characters). It's a rather euro-centric solution to the problem.
The thing is, why should an English speaking person get a warning when they visit a Cyrillic url, but a Russian speaking person doesn't get a warning when visiting a url with Latin characters? Why is apple.com assumed to be legitimate and аррІе.com is considered the fraud?
In fact I'm almost sure that browsers originally used to disable IDNs using some kind of scheme that relied on language preferences back when they first started being used. I suspect they eventually abandoned that approach for this very reason. It only seems like a good idea if you're English speaking (or at least some other Latin-based language).
For a multi-lingual (really multi-char-set, "multi-graphic"?) user
who often visits sites in several different char-sets, and might
have a 60/30/5/5 percentage distribution, getting an "are you
sure?" check before visiting a site with mixed char sets or a
new-to-that-profile unmixed set seems like an useful confirmation
that would not be invoked often, but would be likely to avoid a trip
to the phishy sites. The same approach would work for the 99/1
or 100/0 distribution. The UI should be more of:
"You have never visited a site in this language/character set before"
More_Info. Cancel? Proceed?
I wonder how the domain displays on email clients like gmail and outlook, this is the scariest part, most people will just look at the domain and think it's a valid mail and follow the instructions of that mail, it could be catastrophic for companies, the ubiquity $40 million fiasco comes to mind.
Considering how easy email is to spoof, why bother using a unicode domain which is only similar to the target domain? Why not just use the real domain instead?
Spoofing isnt so easy for gmail and yahoo inboxes. Some web-clients warn of a return path too. For sophisticated spoofing and phishing unicode domains are helpful. Plus, spoofing emails is just a small attack vector.
(btw, Wikipedia notes that "The term homograph is sometimes used synonymously with homoglyph, but in the usual linguistic sense, homographs are words that are spelled the same but have different meanings, a property of words, not characters.")
Mostly because it has an "ignore comments" mode. A lot of non-English speaking programmers write code using English keywords and identifiers but use their native language in the comments.
With some work, it could be made language-agnostic, but that's more than I have time for right now. If comments aren't an issue, you can just grep through all your source files for the offending characters, which shouldn't take more than a simple bash script.
Why is this the scariest one? I've never heard of app.com, any real new or fake news (in the literal sense) coming from that site wouldn't register as legitimate one way or the other.
However apple.com with a CC reset form could be a mighty easy way to scam a lot of people into giving up the personal details which could easily lead to full blown identify theft.
Interesting. The apple.com one (https://www.xn--80ak6aa92e.com/) shows literally that text in Pale Moon (27.2), but shows "аррӏе.com" (Cyrillic text) in Chrome 57 and Firefox 51.
Someone else's example that looks like "app.com" ( http://www.xn--80a6aa.com/) translates to the Cyrillic text, even in Pale Moon. I wonder if Apple's site is on a hard-coded blacklist in the browser, or if every update includes the top-1000 list, or something?
I remember reading about issues with Unicode domains years ago, though. It surprises me that something hasn't been figured out by this point. One mitigation that I remember being discussed was coloring characters from different scripts in different colors, to make variant characters more obvious.
HN Discussion about the same topic from 2 days ago (126 comments to date): https://news.ycombinator.com/item?id=14119713
High level recap:
Chrome - fixed in 59 (current stable is 57)
Firefox - no plans to change; you can adjust network.IDN_show_punycode in about:config
IE - immune
Safari - immune
Could you explain what fixed/immune means? Is it only the confusable characters, ie characters that are visually identical or near-identical to latin characters, that is getting the punycode treatment?
2 replies →
Can a browser could track how many language/character sets are typically used by a browser profile, and warn the user when they are about to use a new, previously unused set, rather than waving the duty off as the "responsibility of domain owners"?
With now over 1000 top-level domains, and however many homographic matches among character sets, expecting people to register dozens of matching domains seems unrealistic.
Won't it be even easier to just check if the domain contains something outside the currently used character set (perhaps always allowing ascii)?
I think that, plus a "you have never visited this site before" kind of warning could go a long way towards combating these kinds of attacks.
I think the real devil is going to be in the UI. You don't want to make it overly scary (otherwise you penalize domains which use some unicode characters correctly), but it can't be so unnoticable that you won't be able to tell when it matters.
Well if you're Russian (or one of the many other nationalities that uses a Cyrillic character set), then that's still not going to help you. If you visit аррІе.com (all Cyrillic characters) you wouldn't get any warning that it wasn't apple.com (all Latin characters). It's a rather euro-centric solution to the problem.
The thing is, why should an English speaking person get a warning when they visit a Cyrillic url, but a Russian speaking person doesn't get a warning when visiting a url with Latin characters? Why is apple.com assumed to be legitimate and аррІе.com is considered the fraud?
In fact I'm almost sure that browsers originally used to disable IDNs using some kind of scheme that relied on language preferences back when they first started being used. I suspect they eventually abandoned that approach for this very reason. It only seems like a good idea if you're English speaking (or at least some other Latin-based language).
For a multi-lingual (really multi-char-set, "multi-graphic"?) user who often visits sites in several different char-sets, and might have a 60/30/5/5 percentage distribution, getting an "are you sure?" check before visiting a site with mixed char sets or a new-to-that-profile unmixed set seems like an useful confirmation that would not be invoked often, but would be likely to avoid a trip to the phishy sites. The same approach would work for the 99/1 or 100/0 distribution. The UI should be more of:
1 reply →
I wonder how the domain displays on email clients like gmail and outlook, this is the scariest part, most people will just look at the domain and think it's a valid mail and follow the instructions of that mail, it could be catastrophic for companies, the ubiquity $40 million fiasco comes to mind.
Considering how easy email is to spoof, why bother using a unicode domain which is only similar to the target domain? Why not just use the real domain instead?
Spoofing isnt so easy for gmail and yahoo inboxes. Some web-clients warn of a return path too. For sophisticated spoofing and phishing unicode domains are helpful. Plus, spoofing emails is just a small attack vector.
2 replies →
What an odd coincidence: I just published a Go package yesterday to detect such attacks in source code. Is there a homography bug going around?
https://github.com/NebulousLabs/glyphcheck
(btw, Wikipedia notes that "The term homograph is sometimes used synonymously with homoglyph, but in the usual linguistic sense, homographs are words that are spelled the same but have different meanings, a property of words, not characters.")
Interesting, but -from the repo description- why this is limited to Go source code files?
Mostly because it has an "ignore comments" mode. A lot of non-English speaking programmers write code using English keywords and identifiers but use their native language in the comments.
With some work, it could be made language-agnostic, but that's more than I have time for right now. If comments aren't an issue, you can just grep through all your source files for the offending characters, which shouldn't take more than a simple bash script.
This is the scariest one: http://www.xn--80a6aa.com/ & http://www.app.com/
Why is this the scariest one? I've never heard of app.com, any real new or fake news (in the literal sense) coming from that site wouldn't register as legitimate one way or the other.
However apple.com with a CC reset form could be a mighty easy way to scam a lot of people into giving up the personal details which could easily lead to full blown identify theft.
Thankfully FF/Chrome are patching this
http://blog.unicode.org/2014/09/updated-unicode-security-spe...
Interesting. The apple.com one (https://www.xn--80ak6aa92e.com/) shows literally that text in Pale Moon (27.2), but shows "аррӏе.com" (Cyrillic text) in Chrome 57 and Firefox 51.
Someone else's example that looks like "app.com" ( http://www.xn--80a6aa.com/) translates to the Cyrillic text, even in Pale Moon. I wonder if Apple's site is on a hard-coded blacklist in the browser, or if every update includes the top-1000 list, or something?
I remember reading about issues with Unicode domains years ago, though. It surprises me that something hasn't been figured out by this point. One mitigation that I remember being discussed was coloring characters from different scripts in different colors, to make variant characters more obvious.
Even if you could train that, it doesn't help color-blind people...
Depends on the palette used. On the other hand, if that's the only indicator, then it doesn't help blind people either.
Thankfully I got this: https://imgur.com/a/3XyIe