Comment by TheDong
8 days ago
No it is not. You would need an md5 preimage attack to go from md5sum to email (what I assume you mean by 'brute force')
To prove my point, c5633e6781ede1aea59db6f76f82a365 is the md5sum of an email address. What's the email address?
If the attacker already knows a given input email ('foo@gmail.com'), then any hash algorithm will identically let them see the emails.
The problem with the above proposal isn't related to hashing, it's that the email address is being used as a password to see sent contents, which seems wrong since email addresses are effectively public.
You’re ofc technically correct about preimage resistance in the abstract, but that’s not the relevant threat model:
MD5 preimage over a uniform 128-bit space is infeasible. Emails are not uniform 128-bit values. They’re low-entropy, structured identifiers drawn from a predictable distribution.
Attackers don’t search 2^128. They search realistic candidates.
Emails are lowercase ASCII, structured as local@domain, domains come from a small known set, usernames follow common patterns, and massive breach corpora already exist. If you’ve ever used John/Hashcat, you know the whole game is shrinking the search space.
Given a large dataset of MD5(email): Precompute common emails, generate likely patterns, restrict by known domains, use leaked datasets, distributed GPU it. I.e, relatively cheap
if the attacker already suspects a specific email, MD5 gives them a perfect equality test. That alone kills privacy.
So unsalted MD5(email) is not protection. It’s a stable public identifier that enables membership testing, cross-dataset linkage, re-ID, and doxxing.
Academic preimage resistance can still hold while real-world privacy absolutely does not.
It's not about breaking MD5’s math, but more about attack economics and low-entropy inputs. To your point, this problem exists with any bare hash. Salt slows large-scale precomputation, but it doesn’t magically add entropy to predictable identifiers.
It ads provability without leaking emails were someone to share a hash for validation sake. Plus anyone can hash their email for a quick access key.
It also makes it possible to publish the dataset later without leaking emails.