← Back to context

Comment by loneboat

4 days ago

Yes. Look up LLM "temperature" - it's an internal parameter that tweaks how deterministic they behave.

The models are deterministic, the inference is not.

  • Which is a useless distinction. When we say models in this context we mean the whole LLM + infrastructure to serve it (including caches, etc).

  • What does that even mean?

    Even then, depending on the specific implementation, associativity of floating point could be an issue between batch sizes, between exactly how KV cache is implemented, etc.

    • That's still an inference time issue. If you have perfect inference with a zero temperature, the models are deterministic. There is no intrinsic randomness in software-only computing.

      3 replies →