← Back to context

Comment by gunapologist99

7 days ago

You’re ofc technically correct about preimage resistance in the abstract, but that’s not the relevant threat model:

MD5 preimage over a uniform 128-bit space is infeasible. Emails are not uniform 128-bit values. They’re low-entropy, structured identifiers drawn from a predictable distribution.

Attackers don’t search 2^128. They search realistic candidates.

Emails are lowercase ASCII, structured as local@domain, domains come from a small known set, usernames follow common patterns, and massive breach corpora already exist. If you’ve ever used John/Hashcat, you know the whole game is shrinking the search space.

Given a large dataset of MD5(email): Precompute common emails, generate likely patterns, restrict by known domains, use leaked datasets, distributed GPU it. I.e, relatively cheap

if the attacker already suspects a specific email, MD5 gives them a perfect equality test. That alone kills privacy.

So unsalted MD5(email) is not protection. It’s a stable public identifier that enables membership testing, cross-dataset linkage, re-ID, and doxxing.

Academic preimage resistance can still hold while real-world privacy absolutely does not.

It's not about breaking MD5’s math, but more about attack economics and low-entropy inputs. To your point, this problem exists with any bare hash. Salt slows large-scale precomputation, but it doesn’t magically add entropy to predictable identifiers.