Comment by csomar
5 days ago
That's still an inference time issue. If you have perfect inference with a zero temperature, the models are deterministic. There is no intrinsic randomness in software-only computing.
5 days ago
That's still an inference time issue. If you have perfect inference with a zero temperature, the models are deterministic. There is no intrinsic randomness in software-only computing.
Floating point associativity differences can lead to non-determinism with 0 temperature if the order of operations are non-deterministic.
Anyone with reasonable experience with GPU computation who pays attention knows that even randomness in warp completion times can easy lead to non-determinism due to associativity differences.
For instance: https://www.twosigma.com/articles/a-workaround-for-non-deter...
It is very well known that CUDA isn't strongly deterministic due to these factors among practitioners.
Differences in batch sizes of inference compound these issues.
Edit: to be more specific, the non-determinism mostly comes from map-reduce style operations, where the map is deterministic, but the order that items are sent to the reduce steps (or how elements are arranged in the tree for a tree reduce) can be non-deterministic.
My point is, your inference process is the non-deterministic part; not the model itself.
Eh., if you have a PyTorch model that uses non-deterministic tensor operations like matrix multiplications, I think it is fair to call the model non-deterministic, since the matmul is not guaranteed to be deterministic - the non determinism of a matmul isn't a bug but a feature.
See e.g.https://discuss.pytorch.org/t/why-is-torch-mm-non-determinis...