← Back to context

Comment by llm_nerd

21 hours ago

That non-deterministic claim, along with the rather ludicrous claim that this is all just some accidental self-awareness of the model or something (rather than Elon clearly and obviously sticking his fat fingers into the machine), make the linked piece technically dubious.

A baked LLM is 100% deterministic. It is a straightforward set of matrix algebra with a perfectly deterministic output at a base state. There is no magic quantum mystery machine happening in the model. We add a randomization -- the seed or temperature -- to as a value-add randomize the outputs in the intention of giving creativity. So while it might be true that "in the customer-facing default state an LLM gives non-deterministic output", this is not some base truth about LLMs.

LLMs work using huge amounts of matrix multiplication.

Floating point multiplication is non-associative:

  a = 0.1, b = 0.2, c = 0.3
  a * (b * c) = 0.006
  (a * b) * c = 0.006000000000000001

Almost all serious LLMs are deployed across multiple GPUs and have operations executed in batches for efficiency.

As such, the order in which those multiplications are run depends on all sorts of factors. There are no guarantees of operation order, which means non-associative floating point operations play a role in the final result.

This means that, in practice, most deployed LLMs are non-deterministic even with a fixed seed.

That's why vendors don't offer seed parameters accompanied by a promise that it will result in deterministic results - because that's a promise they cannot keep.

Here's an example: https://cookbook.openai.com/examples/reproducible_outputs_wi...

> Developers can now specify seed parameter in the Chat Completion request to receive (mostly) consistent outputs. [...] There is a small chance that responses differ even when request parameters and system_fingerprint match, due to the inherent non-determinism of our models.

  • >That's why vendors don't offer seed parameters accompanied by a promise that it will result in deterministic results - because that's a promise they cannot keep.

    They absolutely can keep such a promise, which anyone who has worked with LLMs could confirm. I can run a sequence of tokens through a large LLMs thousands of times and get identical results every time (and have done precisely this! In fact, in one situation it was a QA test I built). I could run it millions of times and get exactly the same final layer every single time.

    They don't want to keep such a promise because it limits flexibility and optimizations available when doing things at a very large scale. This is not an LLM thing, and saying "LLMs are non-deterministic" is simply wrong, even if you can find an LLM purveyor who decided to make choices where they no longer have any interest in such an outcome. And FWIW, non-associative floating point arithmetic is usually not the reason.

    It's like claiming that a chef cannot do something that McDonalds and Burger King don't do, using those purveyors as an example of what is possible when cooking. Nothing works like that.