← Back to context

Comment by cscheid

2 months ago

My understanding is that "weird" unicode code points become https://en.wikipedia.org/wiki/Punycode. I used the 󠅘󠅕󠅜󠅜󠅟 (copy-pasted from the post, presumably with the payload in it) to type a fake domain into Chrome, and the Punycode I got appeared to not have any of the encoding bits.

However, I then pasted the emoji into the _query_ part of a URL. I pointed it to my own website, and sure enough, I can definitely see the payload in the nginx logs. Yikes.

Edit: I pasted the very same Emoji that 'paulgb used in their post before the parenthetical in the first paragraph, but it seems HN scrubs those from comments.

domains get "punycode" encoded, urls get "url encoded"[1], which should make unicode characters stand out. That being said, browsers do accept some non-ascii characters in urls and convert them automatially, so theoretically you could put "invalid" characters into a link and have the browser convert it only after clicking. That might be a viable strategy.

[1] https://www.w3schools.com/tags//ref_urlencode.asp