← Back to context

Comment by lilyball

6 days ago

The argument against PRNGs this paper makes isn't that the PRNG produces results that can be distinguished from TRNG, but that the 256-bit seed deterministically chooses a single shuffling. If you need 300 bits to truly shuffle the assignment but you only have 256 bits, then that's a lot of potential assignments that can never actually happen. With this argument it doesn't matter what the PRNG is, the fact that it's deterministic is all that matters. And this invalidates the p-value because the p-value assumes that all possible assignments are equiprobable, when in fact a lot of possible assignments have a probability of zero.

I imagine you could change the p-value test to randomly sample assignments generated via the exact same process that was used to generate the assignment used by the experiment, and as you run more and more iterations of this the calculated p-value should converge to the correct value, but then the question becomes is the p-value calculated this way the same as the p-value you'd get if you actually went ahead and used equiprobable assignment to begin with?

Ultimately, this all comes down to the fact that it's not hard to use true randomness for the whole thing, and true randomness produces statistically valid results, if you use true randomness for assignment then you can't screw up the p-value test, and so there's no reason at all to even consider how to safely use a PRNG here, all that does is open the door to messing up.

If you have 300 bits of shuffling entropy, you have a lot of potential assignments that can never happen because you won't test them before the universe runs out. No matter how you pick them.

Of course a PRNG generates the same sequence every time with the same seed, but that's true of every RNG, even a TRNG where the "seed" is your current space and time coordinates. To get more results from the distribution you have to use more seeds. You can't just run an RNG once, get some value, and then declare the RNG is biased towards the value you got. That's not a useful definition of bias.

  • The number of possible assignments has to be effectively close to an integer multiple of the number of shuffles.

    It doesn't matter how many universes it would take to generate all of them, there are some assignments that are less likely.

    • Everyone agrees that most of the possible shuffles become impossible when a CSPRNG with 256 bits of state is used. The question is just whether that matters practically. The original author seems to imply it does, but I believe they're mistaken.

      Perhaps it would help to think of the randomization in two stages. In the first, we select 2^256 members from the set of all possible permutations. (This happens when we select our CSPRNG algorithm.) In the second, we select a single member from the new set of 2^256. (This happens when we select our seed and run the CSPRNG.) I believe that measurable structure in either selection would imply a practical attack on the cryptographic algorithm used in the CSPRNG, which isn't known to exist for any common such algorithm.

      1 reply →

    • The cases that are not close to a multiple are handled by the rejection of a part of the generated random numbers.

      Let's say that you have a uniform random number generator, which generates with equal probability anyone of N numbers. Then you want to choose with equal probability one of M choices.

      If M divides N, then you can choose 1 of M by either multiplication with taking the integer part, or by division with taking the remainder.

      When M does not divide N, for unbiased choices you must reject a part of the generated numbers, either rejecting them before the arithmetic operation (equivalent to diminishing N to a multiple of M), or rejecting them after the arithmetic operation (diminishing the maximum value of the integer part of product or of the division remainder, to match M).

      This is enough for handling the case when M < N.

      When M is greater than N, you can use a power of N that is greater than M (i.e. you use a tuple of numbers for making the choice), and you do the same as before.

      However in this case you must trust your RNG that its output sequence is not auto-correlated.

      If possible, using from the start a bigger N is preferable, but even when that is impossible, in most cases the unreachable parts of the space of random number tuples will not make any statistical difference.

      To be more certain of this, you may want to repeat the experiment with several of the generators with the largest N available, taking care that they really have different structures, so that it can be expected that whichever is the inaccessible tuple space, it is not the same.

      1 reply →

And why does it matter in the context of randomly assigning participants in an experiment into groups? It is not plausible that any theoretical "gaps" in the pseudorandomness are related to the effect you are trying to measure, and unlikely that there is a "pattern" created in how the participants get assigned. You just do one assignment. You do not need to pick a true random configuration, just one random enough.

I assume that as long as p-values are concerned, the issue raised could very well be measured with simulations and permutations. I really doubt though that the distribution of p-values from pseudorandom assignments with gaps would not converge very fast to the "real" distribution you would get from all permuations due to some version of a law of large numbers. A lot of resampling/permutation techniques work by permuting a negligible fraction of all possible permutations, and the distribution of the statistics extracted converges pretty fast. As long as the way the gaps are formed are independent of the effects measured, it sounds implausible that the p-values one gets are problematic because of them.

P-values assume something weaker than "all assignments are equiprobable." If the subset of possible assignments is nice in the right ways (which any good PRNG will provide) then the resulting value will be approximately the same.

Gallant always uses TRNGs. Goofus always uses a high-quality PRNG (CSPRNG if you like) that's seeded with a TRNG. Everything else they do is identical. What are circumstances under which Goofus's conclusions would be meaningfully different than Gallant's?

Suppose I'm doing something where I need N(0,1) random variates. I sample from U(0,1) being sure to use a TRNG, do my transformations, and everything's good, right? But my sample isn't U(0,1), I'm only able to get float64s (or float32s), and my transform isn't N(0,1) as there's going to be some value x above which P(z>x)=0. The theory behind what I'm trying to do assumes N(0,1) and so all my p-values are invalid.

Nobody cares about that because we know that our methods are robust to this kind of discretization. Similarly I think nobody (most people) should care (too much) about having "only" 256 bits of entropy in their PRNG because our methods appear to be robust to that.

> then you can't screw up the p-value test

Bakker and Wicherts (2011) would like to disagree! Apparently 15 % screw up the calculation of the p-value.