Comment by tgtweak

8 days ago

Can you code up a quick sqlite database of inbound emails receieved (md5 hashed sender email), subject, body + what your claw's response would have been, if any. A simple dashboard where have to enter your hashed email to display the messages and responses.

I understand not sending the reply via actual email, but the reply should be visible if you want to make this fair + an actual iterative learning experiment.

4 comments

tgtweak

gunapologist99 8 days ago

md5 is trivial to brute force.

TheDong 8 days ago
No it is not. You would need an md5 preimage attack to go from md5sum to email (what I assume you mean by 'brute force')
To prove my point, c5633e6781ede1aea59db6f76f82a365 is the md5sum of an email address. What's the email address?
If the attacker already knows a given input email ('foo@gmail.com'), then any hash algorithm will identically let them see the emails.
The problem with the above proposal isn't related to hashing, it's that the email address is being used as a password to see sent contents, which seems wrong since email addresses are effectively public.
- gunapologist99 7 days ago
  
  You’re ofc technically correct about preimage resistance in the abstract, but that’s not the relevant threat model:
  MD5 preimage over a uniform 128-bit space is infeasible. Emails are not uniform 128-bit values. They’re low-entropy, structured identifiers drawn from a predictable distribution.
  Attackers don’t search 2^128. They search realistic candidates.
  Emails are lowercase ASCII, structured as local@domain, domains come from a small known set, usernames follow common patterns, and massive breach corpora already exist. If you’ve ever used John/Hashcat, you know the whole game is shrinking the search space.
  Given a large dataset of MD5(email): Precompute common emails, generate likely patterns, restrict by known domains, use leaked datasets, distributed GPU it. I.e, relatively cheap
  if the attacker already suspects a specific email, MD5 gives them a perfect equality test. That alone kills privacy.
  So unsalted MD5(email) is not protection. It’s a stable public identifier that enables membership testing, cross-dataset linkage, re-ID, and doxxing.
  Academic preimage resistance can still hold while real-world privacy absolutely does not.
  It's not about breaking MD5’s math, but more about attack economics and low-entropy inputs. To your point, this problem exists with any bare hash. Salt slows large-scale precomputation, but it doesn’t magically add entropy to predictable identifiers.
- tgtweak 7 days ago
  
  It ads provability without leaking emails were someone to share a hash for validation sake. Plus anyone can hash their email for a quick access key.
  It also makes it possible to publish the dataset later without leaking emails.