Comment by Strilanc

4 years ago

Another danger using fork is it duplicates the internal state of pseudo random number generators. It's a great way to accidentally take the same random samples in every process, utterly trashing any statistics you were intending to do. Bonus: the python multiprocessing module silently uses fork by default. Person A writes a "make multiprocessing convenient" library, Person B writes a sampling library, you put them together and... whoops!.

7 comments

Strilanc

cryptonector 4 years ago

Libraries like that should use pthread_atfork() to automatically reset/reseed/whatever state as needed at fork() time.

Strilanc 4 years ago
I don't think that's really a viable strategy in practice in an ecosystem as complex as python's. There's too many libraries and too many little corner cases and interactions around what the behavior should be.
For example, suppose I am using library A and I initialized the random number generator with a fixed seed. Clearly when I fork it's not appropriate for A to reseed, because I wanted fixed behavior. Something is very wrong so probably there should be an exception. But now suppose I was using library B which was using A and B handles getting system entropy to seed A. Now it is clear that when I fork I probably want B to reseed A, but alas A has already raised an exception because it was given a (from its perspective) fixed seed. So now A needs to be redesigned to be given a seed and like some sort of intent on what should happen when forking, and oh my god wow this is creating a lot of work for everyone everywhere this is not actually going to be done consistently and cannot be trusted.
- cryptonector 4 years ago
  
  If you're writing a simulation or a test, then you'll want the PRNG to stay unchanged, and you'll want to be in control of any reseeding.
  For all other RNG uses, you really do want it to reseed.
  A cryptographic PRNG vs. a simulation PRNG are very different things, and should be different libraries.
agwa 4 years ago
pthread_atfork functions aren't called if the application calls the clone syscall directly. The right solution is MADV_WIPEONFORK on Linux, or MINHERIT_ZERO on OpenBSD:
https://www.metzdowd.com/pipermail/cryptography/2017-Novembe...
- cryptonector 4 years ago
  
  That helps with memory mappings, but it doesn't help with file descriptors -- you still have to be careful with those.

Blikkentrekker 4 years ago

Reading up in the python documentation, it seems to seed once from `/dev/urandom`, and then uses it's own generator to generate further random bits.

What's the purpose for this strategy opposed to deriving every single random value from `/dev/urandom`, simple performance?

ninkendo 4 years ago

Reading from /dev/urandom requires a syscall, which can be extremely slow compared to running your own prng in-process.