← Back to context

Comment by gunapologist99

8 days ago

md5 is trivial to brute force.

No it is not. You would need an md5 preimage attack to go from md5sum to email (what I assume you mean by 'brute force')

To prove my point, c5633e6781ede1aea59db6f76f82a365 is the md5sum of an email address. What's the email address?

If the attacker already knows a given input email ('foo@gmail.com'), then any hash algorithm will identically let them see the emails.

The problem with the above proposal isn't related to hashing, it's that the email address is being used as a password to see sent contents, which seems wrong since email addresses are effectively public.

  • You’re ofc technically correct about preimage resistance in the abstract, but that’s not the relevant threat model:

    MD5 preimage over a uniform 128-bit space is infeasible. Emails are not uniform 128-bit values. They’re low-entropy, structured identifiers drawn from a predictable distribution.

    Attackers don’t search 2^128. They search realistic candidates.

    Emails are lowercase ASCII, structured as local@domain, domains come from a small known set, usernames follow common patterns, and massive breach corpora already exist. If you’ve ever used John/Hashcat, you know the whole game is shrinking the search space.

    Given a large dataset of MD5(email): Precompute common emails, generate likely patterns, restrict by known domains, use leaked datasets, distributed GPU it. I.e, relatively cheap

    if the attacker already suspects a specific email, MD5 gives them a perfect equality test. That alone kills privacy.

    So unsalted MD5(email) is not protection. It’s a stable public identifier that enables membership testing, cross-dataset linkage, re-ID, and doxxing.

    Academic preimage resistance can still hold while real-world privacy absolutely does not.

    It's not about breaking MD5’s math, but more about attack economics and low-entropy inputs. To your point, this problem exists with any bare hash. Salt slows large-scale precomputation, but it doesn’t magically add entropy to predictable identifiers.

  • It ads provability without leaking emails were someone to share a hash for validation sake. Plus anyone can hash their email for a quick access key.

    It also makes it possible to publish the dataset later without leaking emails.