Comment by mananaysiempre

4 days ago

> no CSPRNG has ever absolutely passed every statistical test thrown at it

As far as I know (admittedly not a high standard), there is no published statistical test that you could run on, for example, a single AES-256-CTR bitstream set up with a random key and IV, running on a single computer, that would be able to tell you with a meaningful likelihood ratio that you were looking at a pseudorandom rather than truly random input before the computer in question broke down. (I’m assuming related-key attacks are out of scope if we’re talking about an RNG for simulation purposes.)

Cryptographic operations when done correctly result in full chaos within the discrete domain (the so-called avalanche effect). Any bias of any kind gives rise to a distinguisher and the primitive is regarded as broken.

One way to imagine what symmetric cryptography does is a cellular automaton that is completely shuffled every iteration. In the case of Keccak/SHA3, that is almost exactly what happens too.

  • There has never been a perfect CSPRNG.

    • There have been a very large number of CSPRNGs for which there does not exist any known practical method to distinguish them from TRNGs.

      For all of them there are theoretical methods that can distinguish a sequence generated by them from a random sequence, but all such methods require an impossible amount of work.

      For instance, a known distinguisher for the Keccak function that is used inside SHA-3 requires an amount of work over 2^1500 (which was notable because it was an improvement over a naive method that would have required an amount of work of 2^1600).

      This is a so ridiculously large number in comparison with the size and age of the known Universe, that it is really certain that nobody will ever run such a test and find a positive result.

      There are a lot of other such CPRNGs for which the best known distinguishers require a work of over 2^100, or 2^200, or even 2^500, and for those it is also pretty certain that no practical tests will find statistical defects.

      There are a lot of CSPRNGs that could not be distinguished from TRNGs even by using hypothetical quantum computers.

      Even many of the pretty bad cryptographic PRNGs, which today are considered broken according to their original definitions, can be made impossible to distinguish from TRNGs by just increasing the number of iterations in their mixing functions. This is not done because later more efficient mixing functions have been designed, which achieve better mixing with less work.

"before the computer in question broke down."

A good MCMC simulation might test that! E.g. say, training a large diffusion model. It takes way more computing power than the average time for a single computer to fail.

Also, the standards of those tests vs. does it bias the statistical model fitted with MCMC are different.

I am aware of tests vs. ChaCha20 here https://www.pcg-random.org/index.html, I am not aware of tests vs. AES-256-CTR.

However at some point, 100x faster performance w/o an exploitable attack vector is also relevant! (though sometimes people find ways).

CSPRNGs are mostly worried about very specific attack vectors, and sure, they're like to be completely unpredictable. But other applications care more about other attack vectors like lack of k-dimensional equiprobability, and that hurts them far more.

The idea that CSPRNGs are the end all and be all of rngs holds CS back.

  • I am familiar with that site and the PCG PRNGs are based on a sound principle, so they are good for many applications.

    However I have never seen a place where the author says something about finding a statistical defect in ChaCha. She only correctly says that ChaCha is significantly slower than PRNGs like those of the PCG kind (and that it also shares the same property that any PRNG with a fixed state size has, of limited high-dimensional equidistribution; this is also true for any concrete instantiation of the PRNGs recommended by the author; the only difference is that with PRNGs having a simple definition you can make the same structure with a bigger state, as big as you want, but once you have chosen a size, you have again a limit; the PCG PRNGs recommended there, when having greater sizes than cryptographic PRNGs, they become slower than those cryptographic PRNGs, due to slow large integer multiplications).

    In the past, I have seen some claims of statistical tests distinguishing cryptographic PRNGs that were false, due to incorrect methodology. E.g. I have seen a ridiculous paper claiming that an AI method is able to recognize that an AES PRNG is non-random. However, reading the paper has shown that they did not find anything that could distinguish a number sequence produced by AES from a true random sequence. Instead, they could distinguish the AES sequence from numbers read from /dev/random on an unspecified computer, using an unspecified operating system. Therefore, if there were statistical biases, those were likely in whichever was their /dev/random implementation (as many such implementations are bad, and even a good implementation may appear to have statistical abnormalities, depending on the activity done on the computer), not in the AES sequence.

  • Are they claiming that ChaCha20 deviates measurably from equally distributed in k dimensions in tests, or just that it hasn't been proven to be equally distributed? I can't find any reference for the former, and I'd find that surprising. The latter is not surprising or meaningful, since the same structure that makes cryptanalysis difficult also makes that hard to prove or disprove.

    For emphasis, an empirically measurable deviation from k-equidistribution would be a cryptographic weakness (since it means that knowing some members of the k-tuple helps you guess the others). So that would be a strong claim requiring specific support.

    • By O’Neill’s definition (§2.5.3 in the report) it’s definitely not equidistributed in higher dimensions (does not eventually go through every possible k-tuple for large but still reasonable k) simply because its state is too small for that. Yet this seems completely irrelevant, because you’d need utterly impossible amounts of compute to actually reject the hypothesis that a black-box bitstream generator (that is actually ChaCha20) has this property. (Or I assume you would, as such a test would be an immediate high-profile cryptography paper.)

      Contrary to GP’s statement, I can’t find any claims of an actual test anywhere in the PCG materials, just “k-dimensional equdistribution: no” which I’m guessing means what I’ve just said. This is, at worst, correct but a bit terse and very slightly misleading on O’Neill’s part; how GP could derive any practical consequences from it, however, I haven’t been able to understand.

      3 replies →