Comment by llm_nerd
1 day ago
>That's why vendors don't offer seed parameters accompanied by a promise that it will result in deterministic results - because that's a promise they cannot keep.
They absolutely can keep such a promise, which anyone who has worked with LLMs could confirm. I can run a sequence of tokens through a large LLMs thousands of times and get identical results every time (and have done precisely this! In fact, in one situation it was a QA test I built). I could run it millions of times and get exactly the same final layer every single time.
They don't want to keep such a promise because it limits flexibility and optimizations available when doing things at a very large scale. This is not an LLM thing, and saying "LLMs are non-deterministic" is simply wrong, even if you can find an LLM purveyor who decided to make choices where they no longer have any interest in such an outcome. And FWIW, non-associative floating point arithmetic is usually not the reason.
It's like claiming that a chef cannot do something that McDonalds and Burger King don't do, using those purveyors as an example of what is possible when cooking. Nothing works like that.
If not non-associative floating point, what's the reason?
There are a huge number of reasons for large scale systems. Batching sizes when hitting MoE systems (which are basically all LLMs now) leading to routing variations. Consecutive submissions could be routed to entirely different hardware, software, and even quantization levels! Repeat resubmissions could even hit different variations of a model.
No one targets determinism because randomness/"creativity" in LLMs is considered a prime feature, so there is zero reason to avoid variation, but that isn't some core function of LLMs.