← Back to context

Comment by jmalicki

4 days ago

It all comes down to actual specific statistical tests, and how hard they are to break in specific applications.

No CSPRNG is absolutely perfect, no CSPRNG has ever absolutely passed every statistical test thrown at it.

In MCMC, it stresses very different statistical tests than the typical CSPRNG tests.

Every PRNG is absolutely broken if you want to be absolute about it. MCMC and crypto applications push on different aspects where statistical issues will cause application level failures.

See e.g. this paper https://www.cs.hmc.edu/tr/hmc-cs-2014-0905.pdf

(it's not the end all be all, but it's a good survey of why this stuff matters and why it's different)

> no CSPRNG has ever absolutely passed every statistical test thrown at it

As far as I know (admittedly not a high standard), there is no published statistical test that you could run on, for example, a single AES-256-CTR bitstream set up with a random key and IV, running on a single computer, that would be able to tell you with a meaningful likelihood ratio that you were looking at a pseudorandom rather than truly random input before the computer in question broke down. (I’m assuming related-key attacks are out of scope if we’re talking about an RNG for simulation purposes.)

  • Cryptographic operations when done correctly result in full chaos within the discrete domain (the so-called avalanche effect). Any bias of any kind gives rise to a distinguisher and the primitive is regarded as broken.

    One way to imagine what symmetric cryptography does is a cellular automaton that is completely shuffled every iteration. In the case of Keccak/SHA3, that is almost exactly what happens too.

  • "before the computer in question broke down."

    A good MCMC simulation might test that! E.g. say, training a large diffusion model. It takes way more computing power than the average time for a single computer to fail.

    Also, the standards of those tests vs. does it bias the statistical model fitted with MCMC are different.

  • I am aware of tests vs. ChaCha20 here https://www.pcg-random.org/index.html, I am not aware of tests vs. AES-256-CTR.

    However at some point, 100x faster performance w/o an exploitable attack vector is also relevant! (though sometimes people find ways).

    CSPRNGs are mostly worried about very specific attack vectors, and sure, they're like to be completely unpredictable. But other applications care more about other attack vectors like lack of k-dimensional equiprobability, and that hurts them far more.

    The idea that CSPRNGs are the end all and be all of rngs holds CS back.

    • I am familiar with that site and the PCG PRNGs are based on a sound principle, so they are good for many applications.

      However I have never seen a place where the author says something about finding a statistical defect in ChaCha. She only correctly says that ChaCha is significantly slower than PRNGs like those of the PCG kind (and that it also shares the same property that any PRNG with a fixed state size has, of limited high-dimensional equidistribution; this is also true for any concrete instantiation of the PRNGs recommended by the author; the only difference is that with PRNGs having a simple definition you can make the same structure with a bigger state, as big as you want, but once you have chosen a size, you have again a limit; the PCG PRNGs recommended there, when having greater sizes than cryptographic PRNGs, they become slower than those cryptographic PRNGs, due to slow large integer multiplications).

      In the past, I have seen some claims of statistical tests distinguishing cryptographic PRNGs that were false, due to incorrect methodology. E.g. I have seen a ridiculous paper claiming that an AI method is able to recognize that an AES PRNG is non-random. However, reading the paper has shown that they did not find anything that could distinguish a number sequence produced by AES from a true random sequence. Instead, they could distinguish the AES sequence from numbers read from /dev/random on an unspecified computer, using an unspecified operating system. Therefore, if there were statistical biases, those were likely in whichever was their /dev/random implementation (as many such implementations are bad, and even a good implementation may appear to have statistical abnormalities, depending on the activity done on the computer), not in the AES sequence.

    • Are they claiming that ChaCha20 deviates measurably from equally distributed in k dimensions in tests, or just that it hasn't been proven to be equally distributed? I can't find any reference for the former, and I'd find that surprising. The latter is not surprising or meaningful, since the same structure that makes cryptanalysis difficult also makes that hard to prove or disprove.

      For emphasis, an empirically measurable deviation from k-equidistribution would be a cryptographic weakness (since it means that knowing some members of the k-tuple helps you guess the others). So that would be a strong claim requiring specific support.

      4 replies →