Comment by jandrewrogers

19 hours ago

This is surprisingly common.

The security of UUIDv4 is based on the assumption of a high-quality entropy source. This assumption is invalidated by hardware defects, normal software bugs, and developers not understanding what "high-quality entropy" actually means and that it is required for UUIDv4 to work as advertised.

It is relatively expensive to detect when an entropy source is broken, so almost no one ever does. They find out when a collision happens, like you just did.

UUIDv4 is explicitly forbidden for a lot of high-assurance and high-reliability software systems for this reason.

82 comments

jandrewrogers

LocalH 17 hours ago

This is why CloudFlare has done what they did with the lava lamp wall. Not that the wall is such a great source of entropy on its own - I'm sure it's not their only source, but you can never have too many sources of entropy - but it makes it visible in a way that can grab those who don't fully understand the concepts of RNGs and how entropy plays into that.

The more sources of entropy, the more closely you approach "perfect" randomization. And a large chunk of those entropy sources need to be non-deterministic. Even on the small level, local applications running on local systems, like games, can use things like the mouse coordinates, the timings between button presses, the exact frame count since game start before the player presses Start to greatly enhance randomness while still using PRNGs under the hood

Yes, for the latter, that's technically deterministic (and the older the game considered, the more deterministic it is, see TAS runs of old games obliterating the "RNG"). But when you have fifty different parameters feeding into the initial seed, that's fifty things an attack would have to perfectly predict or replay (and there are other ways to avoid replay attacks that can be layered on top)

If CloudFlare had less than 100 different sources of entropy, I'd be disappointed. And that's assuming their algorithm for blending those entropy sources into a single seed value is good

greiskul 14 hours ago
> you can never have too many sources of entropy
This is so true. And the beauty is that with algorithms, we don't even need to know much about the entropy to be able to extract it.
There is the Von Neumann method of generating an unbiased coin from a biased coin. Of throwing it twice, and checking if you got HT or TH. And completely discarding all HH or TT results. It doesn't matter if the coin you are using is 20% or 80%, the result will be a true 50/50.
There are more modern algorithms that can be even better (in that they need less coin tosses if you have a very unbalanced coin).
And then there is modern cryptographic hashing. Feed it all the bits you can. Collisions end up only happening in the real world if every single one of those bits is identical. So if you have actual entropy being fed, that cannot be controlled, predicted, or replicated, modern cryptography tells you that the end result is unique.
- ms_menardi 7 hours ago
  
  > There is the Von Neumann method of generating an unbiased coin from a biased coin. Of throwing it twice, and checking if you got HT or TH. And completely discarding all HH or TT results. It doesn't matter if the coin you are using is 20% or 80%, the result will be a true 50/50.
  This blew my mind. Thank you!
  I had to think about it a bit, so for anyone scratching their head right now trying to figure it out, consider it this way:
  what matters is the ordering, of heads-then-tails, or tails-then-heads.
  It doesn't matter that it's biased one way or the other, if you keep flipping pairs until you get a result with two different values, it's a 50/50 chance whether the less-likely result comes first, or second.
  You might only have a 20% chance of any particular pair having a tails (for example), but in the cases where you do have a tails, it's a 50/50 chance that it comes first or second.
  
  5 replies →
- hun3 25 minutes ago
  
  (Note that this still assumes that each biased-coin toss is i.i.d.)
victorbjorklund 17 hours ago
If I understand it the Lava lamps are 90% PR/fun. They have a lot of other sources for entropy that scales better.
- pverheggen 15 hours ago
  
  Yes, they also have wave machines, pendulums, and mobiles :)
  https://blog.cloudflare.com/harnessing-office-chaos/
  https://blog.cloudflare.com/chaos-in-cloudflare-lisbon-offic...
  
  1 reply →
- nyrikki 11 hours ago
  
  The original from SGI back in the mid 90's, before CPUs had RDRAND instructions etc... was a an actually practical solution.
  At the time I was at the Internet company that originally got online-gaming banned in the US, we were looking at CCDs and Cesium emitters that required a license etc...
  While I am not sure, it seems cloudflare basically implemented one after SGI's[0] patent expired.
  The patent and the licensing cost and adding SGI was a major blocker for us doing it, the startup closed before we found a real solution. But the best PRNGs like Blum Blum Shub were way too slow at the time. But things did improve quickly at that time.
  [0] https://patents.google.com/patent/US5732138A/en
- euroderf 16 hours ago
  
  Ant farm ? Hamster wheels ? Anything critter-driven should provide some entropy.
  
  9 replies →
dheera 14 hours ago
The lava lamps are just for show.
You can get entropy just by plugging an oscilloscope into a pile of dirt and cranking the gain up.
- adrian_b 14 hours ago
  
  Any high-gain amplifier can be used, with its input connected to a resistor or a diode.
  For instance you can use the microphone input of a PC, together with an additional external amplifier made with an audio amplifier integrated circuit or an operational amplifier integrated circuit and with a diode or a resistor at its input. The microphone input of PCs provides a 5 V voltage that can be sufficient as a power supply for a noise source plugged in it.
  Such a true RNG can be made on a small PCB with an audio jack, so you can plug it into any PC with microphone input and have a true RNG that you can trust better than the RNG included in modern Intel and AMD CPUs. In the past, many AMD CPUs had defective internal RNGs. Moreover, both for Intel and for AMD it is impossible to verify whether the internal RNG does what it claims to do or it generates predictable pseudo-random numbers.
- tliltocatl 2 hours ago
  
  Meh. The problem is that it might start receiving you local radio station and end up deterministic enough to screw you. So you need to shield the dirt properly.
bmitc 3 hours ago

> This is why CloudFlare has done what they did with the lava lamp wall.
Interesting. I wonder how true it actually is that they use it like they claim here: https://www.cloudflare.com/learning/ssl/lava-lamp-encryption.... It's in one of their lobbies, so doesn't that make it susceptible to an attack in some way? I'm not knowledgeable enough to know, but I figured if they actually used that method, they'd have a more controlled environment.
I also don't fully understand it. A large part of that wall is static. And the camera isn't going to pick up on the stochastic properties of the lava as much as exists in the real world. So it feels like their images will be very statistically similar.
__s 15 hours ago

Old games are RTA viable to RNG manip: https://m.youtube.com/watch?v=Bgh30BiWG58

Groxx 15 hours ago

Yep - I've seen legitimate-looking dups on bad hardware, and "there are a ton of trailing zeros" is also an incredibly common duplicate mode for some UUID libraries (like earlier Go ones that didn't validate the "requested N bytes, returned 3, you must re-request to get N-3 more" return values. it doesn't happen on most hardware or OSes, so people never check it, so it just comes up in production some day with tens of thousands of collisions).

thecloud 18 hours ago

Thanks for the insight! Mind expanding on what alternatives are being used in high reliability systems instead of UUIDv4?

jandrewrogers 17 hours ago
In high-reliability systems a criterion for identifier design is easy detection of defective identifiers. This includes buggy systems and adversarial manipulation.
The problem with UUIDs that rely on entropy sources is that it is computationally expensive to detect if the statistical distribution of identifiers is diverging from what you would expect from a random oracle. I've written systems that can detect entropy source anomalies but you'll want to turn it off in production.
It is pretty cheap to sanity check most non-probabilistic identifier schemes. UUIDs that use broken hash algorithms (e.g. UUIDv3/5) or leak state (e.g. UUIDv7) are exposed to adversarial exploitation.
The identifier scheme is dependent on the use case. Does the uniqueness constraint apply to the instance of the object or the contents of the object? Is the generation of identifiers federated across untrusted nodes? How large is the potential universe of identifiers?
The basic scheme I've seen is a 128-bit structured value that has no probabilistic component. These identifiers can be encrypted with AES-128 when exported to the public, guaranteeing uniqueness while leaking no internal state. The benefit of this scheme is that it is usually drop-in compatible with standard UUID even though it is technically not a UUID and the internal structure can carry useful metadata about the identifier if you can decrypt it.
Federated generation across untrusted nodes requires a more complex scheme, particularly if the universe of identifiers is extremely large. These intrinsically have a collision risk regardless of how the identifiers are generated.
All of the standardized UUID really weren't designed with the requirements of scalable high-reliability systems in mind. They were optimized for convenience and expedience which is a perfectly reasonable objective. Most people don't need an identifier system engineered for extreme reliability, even though there is relatively little cost to having one.
- eaf7e281 16 hours ago
  
  > leak state (e.g. UUIDv7)
  But according to PostgreSQL, UUIDv7 provides better performance in the database, so is this essentially a trade off between security and speed?
  
  4 replies →
filcuk 18 hours ago
The latest UUID (7?) Uses half random gen, half timestamp. This not only makes it sortable by creation, but would also make a collision like this impossible.
- stanmancan 18 hours ago
  
  It's still possible in most implementations of UUIDv7.
  UUIDv7 assigns the first 48 bits for the timestamp in milliseconds. You can generate a lot of UUID's in a millisecond though!
  Then you have another 12 bits that you can use as you wish; "rand_a". The spec has a few methods they suggest on how to use these bits including 12 bits of random data, using it for sub-millisecond timestamps, or creating a monotonic counter, but each have their downsides:
  - Purely random data means you can still run into collisions and anything within the same millisecond is unordered
  - Sub millisecond you can run into collisions; there's nothing stopping you from generating two UUID's with the same 62 bits of rand_b data in the same sub-millisecond timestamp.
  - Monotonic counters can overflow before the next tick, then what? Rollover? Once you roll over it's no longer monotonic and you can generate the same random data within the same monotonic cycle. Also; it's only monotonic to the system that's generating the UUID. If you have a distributed system and they each have their own monotonic cycles then you'll be generating UUID's with the same timestamp + monotonic counter, and again, are relying on not generating the same random data.
  You can steal some of the 62 bits in rand_b if you want as well; you can use rand_a for sub-millisecond accuracy, and then use a few bits of rand_b for a monotonic counter. There's still a chance of collision here, but it's exceedingly low at the expense of less truly random data at the end.
  If you want truly collision free, you'd also need to assign a couple of bits to identify the subsystem generating the UUID so that the monotonic counter is unique to that subsystem. You lose the ordering part of the monotonic counter this way though, but I guess you could argue that in nearly 100% of cases the accuracy of sub-millisecond order in a distributed system is a lie anyways.
  
  4 replies →
- ffsm8 18 hours ago
  
  Considering the context I think it's worth pointing out that it's technically not impossible - it's just even less likely.
  Everything in crypto is always a probability - never a certainty
  
  8 replies →
- majorchord 11 hours ago
  
  The spec doesn't require the use of actually random numbers though.
matt-p 16 hours ago
UUIDv7 is arguably better, because it is entropy plus time.
- otherme123 15 hours ago
  
  It is what I usually use for its sorting, but some people don't want to leak time info.
- majorchord 11 hours ago
  
  Entropy is not a requirement in the UUID spec.
lazide 18 hours ago

Sequences, generally.

perching_aix 18 hours ago

How is UUIDv4 to blame for a broken source of entropy? Or am I misinterpreting your words?

hmry 18 hours ago

I wouldn't say it's "to blame", but it is more susceptible to bad RNG.
If the RNG is bad, you'll get more benefit from adding non-random bits than you would from additional badly RNG'd bits.
The probability of future collisions also rises the more IDs you generate. If you incorporate non-random bits, you can alleviate that:
- timestamps make the collision probability not grow over time as you accumulate more existing UUIDs that could collide
- known-distinct machine IDs make the collision probability not grow as you add more machines
jandrewrogers 17 hours ago

I never blamed UUIDv4 for broken entropy sources. A broken entropy source breaks UUIDv4 even if you are using it correctly.
There is a long history of broken entropy sources showing up in real systems. No matter how hard people try to prevent this it keeps happening. Consequently, a requirement for high-quality entropy sources is correctly viewed as an unnecessary and avoidable foot-gun in high-reliability software systems.
hombre_fatal 18 hours ago

Presumably they mean using randomness as unique IDs.

adonovan 7 hours ago

For a while we’ve been fixing telemetry-reported crash bugs in the project I maintain, and now hardware bugs are showing up with some frequency. I was amazed how common they are. Sometimes data values (e.g. SP register) are corrupted, but other times even infallible operations (e.g loads of rodata constants) crash, indicating that the instruction itself was corrupted. So, yeah, I believe you’ll eventually see UUID collisions, but not because the underlying cryptanalysis was wrong.

Hizonner 15 hours ago

> UUIDv4 is explicitly forbidden for a lot of high-assurance and high-reliability software systems for this reason.

Hmm. What do those systems do for cryptography? Just assume it won't work and not rely on it at all?

jandrewrogers 15 hours ago

In these kinds of systems the cryptographic components often aren't even accessible from the software. It isn't a thing you need to worry about.
This makes it easier to audit for use of entropy sources in the software since there really isn't a valid use case for it.

erikerikson 16 hours ago

Super simple to detect and try again.

jandrewrogers 15 hours ago
A collision is simple to detect but it requires you to actually check, which is expensive at scale. The entire point of UUIDv4 is that you don't have to check for collisions because it should never happen. But if you don't check and it does happen you are in UB territory which is generally very bad.
A risk of collision before it happens is non-trivial to detect but this is really what you'd want.
- erikerikson 12 hours ago
  
  Only expensive if you have unsorted keys or lack an index. Neither of which are unscalable.
  
  10 replies →
squirrellous 9 hours ago
In this specific case. In the case of trace IDs (an example of which is [1]) where the equivalent of UUIDs are explicitly used to avoid coordination, it’s hard to imagine how you’d reliably detect and retry.
[1] https://news.ycombinator.com/item?id=48033853
- erikerikson 7 hours ago
  
  A lot of databases have a uniqueness constraint that is basically a register level compare and replace. Others have a if_not_exists which is nearly the same. If you're not targeting a serious throughput use case, it's enough. If you are then there are lots of solutions/alternatives that completely avoid coordination. On the other hand, maybe tracing protocols are robust to out of order delivery. If that won't do them sequence numbers tied to monotonic sequence IDs should be plenty. If not then I'd need very serious conversations to be convinced you're not wasting everyone's time

ranger_danger 11 hours ago

Reading the UUID spec leads me to believe that good entropy is not even a requirement for any version:

> Implementations SHOULD utilize a cryptographically secure pseudorandom number generator (CSPRNG) to provide values that are both difficult to predict ("unguessable") and have a low likelihood of collision ("unique").

From https://www.rfc-editor.org/rfc/rfc9562.html#unguessability

So I don't think technically we can say entropy or random numbers at all are even "required for UUIDv4 to work as advertised."

aaron695 14 hours ago

[dead]