← Back to context

Comment by austin-cheney

4 days ago

Hashes should never be a source of randomness. Randomness makes assumptions far outside their intended use case.

Hashes should only be a reproducible label that cannot be used to produce the material described by the hash. When used for their intended purposes hashes serve as the strongest point of integrity until value collisions are discovered.

But once you've made a function that "cannot be used to produce the material described by the hash", you've also made a very good pseudo-randomizer. In fact, if a cryptographic hash function cannot be trusted for its ability to produce apparent randomness, then it cannot be trusted for its "intended purposes". You get both properties or neither.

  • This is broken logic.

    There is an untested assumption that hashes achieve randomness because they appear to be a random collection of characters. Hash sequences are completely reproducible given a set of input, and that is by definition not random.

    I think you are confusing loss of prediction as randomness. Never in mathematics or logic is that line of thinking correct. This can be described by equivocation, fallacy of composition, inductive fallacy, and more.

    • I think you are mixing the function itself and it's output, if for a given input to the function the output is uniformly random, then this is a way to derive randomness. The fact that the function itself is deterministic tells you nothing about the distribution of it's output.

      1 reply →

    • > There is an untested assumption that hashes achieve randomness because they appear to be a random collection of characters.

      lol, no. Cryptographic hash functions are specifically designed to achieve this property.

      > Never in mathematics or logic

      Let's not get ahead of ourselves. Start with English - what does "pseudo" mean?

      > This can be described by equivocation, fallacy of composition, inductive fallacy, and more.

      For example, what is a pseudo-intellectual?

      2 replies →

You may want to stay away from all modern CSPRNGs then. Eg Yarrow and Fortuna rely on sources of random input data being mixed in but using a strong hash function (nowadays sha-256) to produce the output at arbitrarily fast rates without consuming entropy.

And to your criticism that this is just programmers who don’t know what they’re doing, these algorithms were developed by Bruce Schneier, Niels Ferguson, and John Kelsey.

I was so frustrated as a noob trying to understand why people were using hashes this way. Even a professional told me "yeah but a collision is really unlikely," and compared it to neutrino interference. How is that supposed to be good enough?

  • Hash functions are used to instantiate a random oracle (which is a theoretical object that can't be instantiated because it would be of infinite size but makes it easy to reason about) because it doesn't seems crazy as an assumption that if finding a collision between 2 hashes is hard it should be hard to predict the output of the so called hash function. However it is well known that there was some contrive counter example for protocols that are secure under the Random Oracle model and unsecure when instanciated with any hash function. The problem with this paper is that the protocol it described isn't so contrive anymore. Cryptography is a matter of assumptions and what you believe in or not. You might want to not use random oracle but you will therefore have to restrict yourself in what you can concretely build.

    And the reason behind the problem outlined in the paper isn't a biased randomness problem but the fact that you can represent the hash function compared to a RO.

  • Every hash function will have collisions as long as the input is larger than the resulting hash.

    Some are designed that changing a bit has a massive influence on the resulting hash, others do the opposite.

  • Whether hashes are appropriate depend on the details of the usecase.

    However, if the negligible chance of a collision is what you are worried about, those also happen with random numbers.

I'm sorry, but this comment is very vague and unclear.

Cryptographers know that hashes (even cryptographically strong ones!) are deterministic. Yet, it is possible that in going from an interactive proof to a non-interactive one, one does not actually need randomness. Indeed, for some class of protocols, we know how to design hash functions satisfying a particular property (correlation intractability) so that the resulting non-interactive proof is sound. It's just that (a) these hashes are inefficient, and (b) until now no one had found a non-contrived protocol where using standard hashes leads to an attack.

You should look into the HyperLogLog algorithm, where fair hash "randomness" is required for the algorithm to work. There are use cases where the pseudo-randomness of hashes is useful, is what I'm trying to say.

  • This is why you should NEVER trust software developers to make security decisions unless certified to do so. True randomness is challenging to achieve, because computers are inherently predictable. Pseudo-randomness is an intended process to intentionally achieve randomness in spite of this high predictability, often through use of physical or electromagnetic criteria outside the computing machine.

    Hash algorithms are none of that. They are not pseudo-randomness merely because a software developer merely wishes them to be so. Hash algorithms are intentionally designed to achieve high reproducibility in that a given set of input should always result in the same hash sequence as output. That intended reproducibility is by definition not random.

    • You don't understand what pseudo-randomness means. Virtually all PRNGs, even many CSPRNGs, have a way to initialize the algorithm with a seed, and its outputs are fully predictable based on that seed. Choosing a truly random seed, such as one produced by RNG hardware, will lead to a usefully random sequence - but the algorithm is still fully deterministic based on that seed.

    • >True randomness is challenging to achieve, because computers are inherently predictable

      Most modern CPUs now contain a true RNG. They usually use some combination of a metastable latch, or thermal randomness through some kind of analog amplification. Bit strings from this are passed into a pseudorandom number generator to amplify the amount of random data generated.

      There probably attacks on this too but it's much harder.

      2 replies →

You realize all signatures in use today basically use hash functions as randomness

  • What? You’ve managed to mangle so many terms in so few words… Signatures can refer to two things: integrity checks on a file or authentication checks for a recieved file. In the integrity check situation a hash function (e.g., SHA) is often used. In the authentication check situation, we usually use a public/private keypair for asymmetric encryption; the hash function is only part of the process. The key material used to make this keypair (should) comes from some random number generator…

    The ‘hash’ function is a deterministic transform, not a source of randomness.

    • He is technically not wrong, most signatures can be seen has a public coin interactive proof system where you prove knowledge of a private key. They are then compiled into an non-interactive proof system via the Fiat-Shamir transform that uses a random oracle concretely instantiated using a hash function (easy to see in Schnorr signature). So at the end you are using a Hash function to generate your random coin.

  • That is wrong. Most digital signatures in use today use certificates trusted through a certificate trust chain. The algorithms are different.

    • Internally, most signature algorithms use hash functions. RSA-PSS, EdDSA and ML-DSA use them to provide something like randomness, and the security analysis of those signature schemes includes arguments assuming (in some very particular, technical ways) that the hash function outputs "look random".

      Classical DSA and ECDSA do not use hash functions this way, but in my opinion they aren't stronger for it: they're basically assuming instead that some other mathematical function "looks random", which seems riskier than assuming that about a hash function. I've heard that the reason for this is to get around Schnorr's patent on doing it with hash functions, which has since expired.

      The SHA3 and SHAKE hash functions (underlying e.g. ML-DSA) are explicitly designed to "look random" as well.

      There are some signature schemes that try not to make such strong assumptions: in particular SLH-DSA targets properties more like first- and second-preimage resistance, target-collision-resistance, and so on.

      2 replies →

    • It's not wrong. The only thing preventing me from forging your certificate is my inability to generate a new cert which hashes to the same digest as what is in your cert's signature. I don't actually need the keys if I break the hash.

      EDIT2: I'm doing a bad job of explaining this... you obviously need the keypair associated with the cert to initiate connections with it and not trigger MITM alerts. But if you break the hash function, you don't need the private key from the root cert, the verbatim signature from the original cert will appear to be valid when spliced into your forged cert if the hash digest computation on the forged cert is the same.

    • And checking the validity of a certificate involves checking a signature of... The certificate's hash. If you can break the underlying hash function, then the trust chain is broken.