Comment by gunapologist99
7 days ago
You’re ofc technically correct about preimage resistance in the abstract, but that’s not the relevant threat model:
MD5 preimage over a uniform 128-bit space is infeasible. Emails are not uniform 128-bit values. They’re low-entropy, structured identifiers drawn from a predictable distribution.
Attackers don’t search 2^128. They search realistic candidates.
Emails are lowercase ASCII, structured as local@domain, domains come from a small known set, usernames follow common patterns, and massive breach corpora already exist. If you’ve ever used John/Hashcat, you know the whole game is shrinking the search space.
Given a large dataset of MD5(email): Precompute common emails, generate likely patterns, restrict by known domains, use leaked datasets, distributed GPU it. I.e, relatively cheap
if the attacker already suspects a specific email, MD5 gives them a perfect equality test. That alone kills privacy.
So unsalted MD5(email) is not protection. It’s a stable public identifier that enables membership testing, cross-dataset linkage, re-ID, and doxxing.
Academic preimage resistance can still hold while real-world privacy absolutely does not.
It's not about breaking MD5’s math, but more about attack economics and low-entropy inputs. To your point, this problem exists with any bare hash. Salt slows large-scale precomputation, but it doesn’t magically add entropy to predictable identifiers.
No comments yet
Contribute on Hacker News ↗