Comment by loneboat

4 days ago

Yes. Look up LLM "temperature" - it's an internal parameter that tweaks how deterministic they behave.

7 comments

loneboat

The models are deterministic, the inference is not.

coldtea 4 days ago

Which is a useless distinction. When we say models in this context we mean the whole LLM + infrastructure to serve it (including caches, etc).
jmalicki 4 days ago
What does that even mean?
Even then, depending on the specific implementation, associativity of floating point could be an issue between batch sizes, between exactly how KV cache is implemented, etc.
- csomar 4 days ago
  
  That's still an inference time issue. If you have perfect inference with a zero temperature, the models are deterministic. There is no intrinsic randomness in software-only computing.
  
  3 replies →