Comment by vessenes
7 hours ago
To be clear, temperature 0 is deterministic and will produce the same output for exact duplicate inputs, across all seed choices.
Provided:
* If it’s MoE we are talking about, that the duplicate inputs are for the whole batch (yes, your batch neighbours can impact your choice of experts. Blergh.)
* Your kernels are deterministic
* There’s no system wide effort switch that responds to, e.g. work load across the cluster (for a thinking model)
Upshot:
Temperature 0 is not deterministic in probably any existing cloud infra, but it could be for edge inference pretty reliably.
To your quibble on 0.1 being more deterministic - I think it’s a pretty fair summary - we’re going to sample much more from the ‘temp 0’ answer at 0.1 than we would at temp 0.9, no?
Even then it's deterministic in the way a hash function is deterministic. Change one letter and you can get a completely different output. What people actually want is something continuous.
Agreed on the desire for continuous behavior. That said, in a modern LLM, is this hash analogy accurate? I would be surprised if a single letter changed most zero temp force ranked outputs.
E.g:
“Where is the Eiffel Tower Located? One word only.”
“Where is the Effel Tower located? One word only.”
“Where is the Eiffel Tower located? One wor only.”
I’d be very surprised if those got different answers from even a small local model at temp 0.
For a single word response, perhaps.
But for anything else I wouldn't.
The entire chain will be affected from the different tokenization on down. Even if it lands in roughly the same semantic area, it doesn't mean it will land there with anything like the same syntactic selections. Anywhere there were multiple near-tokens could easily select a different route based on even minor fluctuations in the starting conditions. It's chaotic.
"Your are a helpful/less assistant"
Give it a try. 4 letter difference. Add a few 100 tokens describing the task, such that the change becomes a tiny fraction of the input.
Discontinuities everywhere.
This is it. People mistake deterministic for precise/exact/correct. It's not.