Comment by zahlman
1 day ago
You may be interested in https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldm... .
> The non-determinism at temperature zero, we guess, is caused by floating point errors during forward propagation. Possibly the “not knowing what to do” leads to maximum uncertainty, so that logits for multiple completions are maximally close and hence these errors (which, despite a lack of documentation, GPT insiders inform us are a known, but rare, phenomenon) are more reliably produced.
No comments yet
Contribute on Hacker News ↗