Comment by Klathmon
8 years ago
Won't it be even easier to just check if the domain contains something outside the currently used character set (perhaps always allowing ascii)?
I think that, plus a "you have never visited this site before" kind of warning could go a long way towards combating these kinds of attacks.
I think the real devil is going to be in the UI. You don't want to make it overly scary (otherwise you penalize domains which use some unicode characters correctly), but it can't be so unnoticable that you won't be able to tell when it matters.
Well if you're Russian (or one of the many other nationalities that uses a Cyrillic character set), then that's still not going to help you. If you visit аррІе.com (all Cyrillic characters) you wouldn't get any warning that it wasn't apple.com (all Latin characters). It's a rather euro-centric solution to the problem.
The thing is, why should an English speaking person get a warning when they visit a Cyrillic url, but a Russian speaking person doesn't get a warning when visiting a url with Latin characters? Why is apple.com assumed to be legitimate and аррІе.com is considered the fraud?
In fact I'm almost sure that browsers originally used to disable IDNs using some kind of scheme that relied on language preferences back when they first started being used. I suspect they eventually abandoned that approach for this very reason. It only seems like a good idea if you're English speaking (or at least some other Latin-based language).
For a multi-lingual (really multi-char-set, "multi-graphic"?) user who often visits sites in several different char-sets, and might have a 60/30/5/5 percentage distribution, getting an "are you sure?" check before visiting a site with mixed char sets or a new-to-that-profile unmixed set seems like an useful confirmation that would not be invoked often, but would be likely to avoid a trip to the phishy sites. The same approach would work for the 99/1 or 100/0 distribution. The UI should be more of:
Interestingly enough, my Chrome sends "accept-language:en-US,en;q=0.8,ko;q=0.6". I dont even know how it infers that.