Comment by spindump8930

1 day ago

The many sources of stochastic/non-deterministic behavior have been mentioned in other replies but I wanted to point out this paper: https://arxiv.org/abs/2506.09501 which analyzes the issues around GPU non determinism (once sampling and batching related effects are removed).

One important take-away is that these issues are more likely in longer generations so reasoning models can suffer more.