Comment by fc417fc802
5 days ago
It's an interesting observation and that's a nice example you provided but does it actually matter? Just because certain sequences can't occur doesn't necessarily mean the bias has any practical impact. It's bias in the theoretical sense but not, I would argue, in the practical sense that is actually relevant. At least it seems to me at first glance, but I would be interested to learn more if anyone thinks otherwise.
For example. Suppose I have 2^128 unique playing cards. I randomly select 2^64 of them and place them in a deck. Someone proceeds to draw 2^8 cards from that deck, replacing and reshuffling between each draw. Does it really matter that those draws weren't technically independent with respect to the larger set? In a sense they are independent so long as you view what happened as a single instance of a procedure that has multiple phases as opposed to multiple independent instances. And in practice with a state space so much larger than the sample set the theoretical aspect simply doesn't matter one way or the other.
We can take this even farther. Don't replace and reshuffle after each card is drawn. Since we are only drawing 2^8 of 2^64 total cards this lack of independence won't actually matter in practice. You would need to replicate the experiment a truly absurd number of times in order to notice the issue.
If it had a practical impact, then it would imply that such statistical tests could be used as a distinguisher to attack the RNG. They fail as distinguishers, even with absolutely enormous amounts of data, so the bias is too small to have any influence in any practical experiment. You'd expect to need to observe 2^128 states to detect bias in a 256-bit CSPRNG, which means you'll have to store 2^128 observed states. That's around 10^20 EiB of storage needed. Good luck affording that with drive prices these days!
At a certain point a bias in the prng just has to become significant? This will be a function of the experiment. So I don’t think it’s possible to talk about a general lack of “practical impact” without specifying a particular experiment. Thinking abstractly - where an “experiment” is a deterministic function that takes the output of a prng and returns a result - an experiment that can be represented by a constant function will be immune to bias, while one which returns the nth bit of the random number will be susceptible to bias.
> At a certain point a bias in the prng just has to become significant?
Sure, at a point. I'm not disputing that. I'm asking for a concrete bound. When the state space is >= 2^64 (you're extremely unlikely to inadvertently stumble into a modern PRNG with a seed smaller than that) how large does the sample set need to be and how many experiment replications are required to reach that point?
Essentially what I'm asking is, how many independent sets of N numbers must I draw from a biased deck, where the bias takes the form of a uniformly random subset of the whole, before the bias is detectable to some threshold? I think that when N is "human" sized and the deck is 2^64 or larger that the number of required replications will be unrealistically large.