Comment by llm_nerd
9 hours ago
There are a huge number of reasons for large scale systems. Batching sizes when hitting MoE systems (which are basically all LLMs now) leading to routing variations. Consecutive submissions could be routed to entirely different hardware, software, and even quantization levels! Repeat resubmissions could even hit different variations of a model.
No one targets determinism because randomness/"creativity" in LLMs is considered a prime feature, so there is zero reason to avoid variation, but that isn't some core function of LLMs.
No comments yet
Contribute on Hacker News ↗