← Back to context

Comment by cubefox

4 hours ago

Yep. Zero temperature is neither necessary nor sufficient for deterministic inference.

Why?

  • You can seed the randomness are still having nonzero temperature.

    Numerical instability can introduce randomness especially on GPU like hardware unless you’re very careful about how you write your algorithms.

  • In any batch inference environment that includes experts, expert routing may vary depending on what else is in the batch. For one thing.