Comment by erikerikson

20 hours ago

Super simple to detect and try again.

14 comments

erikerikson

A collision is simple to detect but it requires you to actually check, which is expensive at scale. The entire point of UUIDv4 is that you don't have to check for collisions because it should never happen. But if you don't check and it does happen you are in UB territory which is generally very bad.

A risk of collision before it happens is non-trivial to detect but this is really what you'd want.

erikerikson 17 hours ago
Only expensive if you have unsorted keys or lack an index. Neither of which are unscalable.
- jandrewrogers 17 hours ago
  
  You must have missed the “at scale” part. There is nothing inexpensive about extra network hops, cache misses, and page faults implied by your solution. Indexing at scale is almost always lossy for performance reasons. The location where you insert a new record is frequently not the same location as where you have to search for an existing record.
  It is resource amplification all the way down. In a lot of systems that index these keys the cost of that check is several times that of doing a blind insert.
  
  3 replies →
- orf 17 hours ago
  
  AKA centralising a decentralised identifier generator?
  
  5 replies →

squirrellous 13 hours ago

In this specific case. In the case of trace IDs (an example of which is [1]) where the equivalent of UUIDs are explicitly used to avoid coordination, it’s hard to imagine how you’d reliably detect and retry.

[1] https://news.ycombinator.com/item?id=48033853

erikerikson 12 hours ago

A lot of databases have a uniqueness constraint that is basically a register level compare and replace. Others have a if_not_exists which is nearly the same. If you're not targeting a serious throughput use case, it's enough. If you are then there are lots of solutions/alternatives that completely avoid coordination. On the other hand, maybe tracing protocols are robust to out of order delivery. If that won't do them sequence numbers tied to monotonic sequence IDs should be plenty. If not then I'd need very serious conversations to be convinced you're not wasting everyone's time