Comment by 317070

14 hours ago

> so in principle, setting temperature to 0 _should_ result in deterministic outputs

It is a common misconception, but it is not true even in principle. If I have 2 or more logits which are equal to the maximum of my logits, I will sample uniformly random from them with any temperature, even zero. Sampling from softmax([1, 0, 1]) is still stochastic at temperature 0, because the limit is to sample uniformly from the first or the last element.

Anyway: "GPUs don't do deterministic matrix multiplications" is the biggest source of randomness in LLMs. GPUs put the associativity of the sums in matrix multiplications in arbitrary order, and this has a huge impact on the logits coming out of the neural network.

8 comments

317070

jstanley 12 hours ago

> "GPUs don't do deterministic matrix multiplications" is the biggest source of randomness in LLMs.

But this isn't a fundamental property of LLMs, it's just an implementation detail. It's pretty obvious that if you evaluate the matrix multiplications correctly and deterministically sample from the highest-probability outputs, you will have a deterministic LLM.

vbarrielle 12 hours ago
It may be an implementation detail, but in practice, if the only way to get a deterministic output is to run on the CPU, then it's not going to be usable.
- 317070 11 hours ago
  
  Actually, Google's TPUs are also deterministic!
- Dylan16807 10 hours ago
  
  You can tell GPUs what order to do math instructions in.

EvgeniyZh 14 hours ago

You don't have to sample uniformly. You could take the lowest index of all maxima. But yeah, the main source of randomness is non-deterministic matmul, and temperature does nothing with it

DougBTX 12 hours ago

> GPUs put the associativity of the sums in matrix multiplications in arbitrary order

That’s user-controlled too, not an inherent property of GPUs:

https://docs.pytorch.org/docs/2.12/generated/torch.use_deter...

vbarrielle 12 hours ago
The matrix multiplication is only deterministic for sparse-dense products under these settings:
> torch.bmm() when called on sparse-dense CUDA tensors
And it's not listed under the operations that raise an exception otherwise, so I'm not sure the docs promise that dense-dense matrix-matrix products are deterministic.
- DougBTX 9 hours ago
  
  Oh, thanks, that’s interesting, I thought it covered that too!