← Back to context

Comment by tgtweak

8 days ago

Can you code up a quick sqlite database of inbound emails receieved (md5 hashed sender email), subject, body + what your claw's response would have been, if any. A simple dashboard where have to enter your hashed email to display the messages and responses.

I understand not sending the reply via actual email, but the reply should be visible if you want to make this fair + an actual iterative learning experiment.

md5 is trivial to brute force.

  • No it is not. You would need an md5 preimage attack to go from md5sum to email (what I assume you mean by 'brute force')

    To prove my point, c5633e6781ede1aea59db6f76f82a365 is the md5sum of an email address. What's the email address?

    If the attacker already knows a given input email ('foo@gmail.com'), then any hash algorithm will identically let them see the emails.

    The problem with the above proposal isn't related to hashing, it's that the email address is being used as a password to see sent contents, which seems wrong since email addresses are effectively public.

    • You’re ofc technically correct about preimage resistance in the abstract, but that’s not the relevant threat model:

      MD5 preimage over a uniform 128-bit space is infeasible. Emails are not uniform 128-bit values. They’re low-entropy, structured identifiers drawn from a predictable distribution.

      Attackers don’t search 2^128. They search realistic candidates.

      Emails are lowercase ASCII, structured as local@domain, domains come from a small known set, usernames follow common patterns, and massive breach corpora already exist. If you’ve ever used John/Hashcat, you know the whole game is shrinking the search space.

      Given a large dataset of MD5(email): Precompute common emails, generate likely patterns, restrict by known domains, use leaked datasets, distributed GPU it. I.e, relatively cheap

      if the attacker already suspects a specific email, MD5 gives them a perfect equality test. That alone kills privacy.

      So unsalted MD5(email) is not protection. It’s a stable public identifier that enables membership testing, cross-dataset linkage, re-ID, and doxxing.

      Academic preimage resistance can still hold while real-world privacy absolutely does not.

      It's not about breaking MD5’s math, but more about attack economics and low-entropy inputs. To your point, this problem exists with any bare hash. Salt slows large-scale precomputation, but it doesn’t magically add entropy to predictable identifiers.

    • It ads provability without leaking emails were someone to share a hash for validation sake. Plus anyone can hash their email for a quick access key.

      It also makes it possible to publish the dataset later without leaking emails.