← Back to context

Comment by valarauca1

10 years ago

The old bits of entropy count is based on extended ASCII. In reality we could count UFT-8 code points, with each code point having 1/#code_point entropy.

As a brute force guesser can throw UTF-8 chars instead of attempting to rebuild emoji from their underlying ASCII string.

"with each code point having 1/#code_point entropy."

That requires that users be uniformly-randomly selecting Unicode characters. There's a number of problems with this idea, most notably that the resulting password would have an insanely high "difficulty to type"/"bit of entropy" ratio. By the time you're through your third keyboard mode switch or third character typed in via generic Unicode hex entry, a 4-word passphrase user already has logged in and opened their browser.

Mixing in a single Unicode character into your password might be sorta clever, but you probably shouldn't rely on getting a lot more "bits" out of it.

  • Users don't uniformly select ASCII characters but generally we accept 1 char of password length === 8 bits of entropy.

    • No, we do not. Six is a much better estimate (26 times 2+10 = 62, close to 64), and that's still for a uniformly-random selection, which many passwords are not even close to.

    • > 1 char of password length === 8 bits of entropy

      Oh hell no. https://xkcd.com/936/

      The "little obscure tricks" to increase the entropy of a password do NOT work well with human memory. If your template is "Uncommon Word + Emoji + 5 tweaks", your entropy is 50,000 (the uncommon word) x (number of Emojis) x 5 * 8 (there are roughly 8 ways to "tweak" a word).

      There are no more than 500 Emojis that people use. You're not getting much entropy by choosing one. Now if you start choosing obscure Chinese words and Arabic symbols, maybe you'd be getting somewhere (It requires mastery of multiple languages to really exercise that UTF-8 dataset).

      But honestly, an English-speaker will get far more entropy by just adding two more common words (top 5000) to their password. A new common word is worth a hell of a lot more than an Emoji. A phrase of 8 words (ie a sentence) is also very easy to memorize and contains a ton of entropy as well.

      Even a simple sentence is impossible to brute force. The following sentence has probably never been said in the history of humanity:

      "My long password to gmail.com is a passphrase, the current sentence that I just typed, lulz!"

      That sentence is virtually unhackable and easy as heck to memorize. Sure, the entropy is only a few bits per character, but the length makes it better. And since it uses common letters, it is extremely quick to type.

      So unless you plan on learning a new language to hit those obscure Unicode symbols, I think its best to just stick with what your brain is already wired to memorize: Words. Common English Words.

      2 replies →