Comment by jstanley
12 hours ago
> "GPUs don't do deterministic matrix multiplications" is the biggest source of randomness in LLMs.
But this isn't a fundamental property of LLMs, it's just an implementation detail. It's pretty obvious that if you evaluate the matrix multiplications correctly and deterministically sample from the highest-probability outputs, you will have a deterministic LLM.
It may be an implementation detail, but in practice, if the only way to get a deterministic output is to run on the CPU, then it's not going to be usable.
Actually, Google's TPUs are also deterministic!
You can tell GPUs what order to do math instructions in.