← Back to context

Comment by encryptluks2

3 years ago

This sort of reaffirms my belief that UIDs are not sufficient for syncing mail. Emails should be hashed and synced by the hash which would solve other issues, like being able to redownload specific messages that may have got corrupted locally.

Even so, isn't this a violation of the IMAP standard, which says that UIDs are, by design, not permanent identifiers, but UID + UIDVALIDITY is? (I don't know much about IMAP.)

  • No, RFC 3501 says

    > The unique identifier of a message MUST NOT change during the > session, and SHOULD NOT change between sessions. Any change of > unique identifiers between sessions MUST be detectable using the > UIDVALIDITY mechanism discussed below. Persistent unique identifiers > are required for a client to resynchronize its state from a previous > session with the server (e.g., disconnected or offline access > clients); this is discussed further in [[IMAP-DISC](https://www.rfc-editor.org/rfc/rfc3501#ref-IMAP-DISC)].

    so, "SHOULD NOT", but in practice it's really hard to make {UID, UIDVALIDITY} assignments persistent and unique, so IMAP servers don't, and as you can see, they are allowed to not.

    I.e., it's perfectly compliant to generate a new UIDVALIDITY for each session and then assign UIDs to emails in folders when you open them

  • It is definitely a recommendation, but UIDVALIDITY just checks the folder from what I understand. Hashing the entire message would be the best way from my understanding to sync messages.

Can hashes not collide? Would that not cause problems?

  • In practice the odds can be astronomically low, as in lower than the odds that an asteroid collides with Earth right now and the entire humanity becomes extinct. But only for hashes without known vulnerabilities.

    For a vulnerable hash like md5, an attacker can find a collision in a few seconds.

    • I only say this in case anyone reads your message and gets the wrong idea. Currently, there is no feasible preimage attack for MD5. You can easily generate two colliding inputs, but cannot, given a hash, find an input to generate that hash.

      And I don't believe that accidental MD5 collisions are something to worry about.

  • You only need the lookup key to be very selective, then you can use cached metadata to pick from among conflicts.