Comment by lgessler
1 day ago
In my (poor) understanding, this can depend on hardware details. What are you running your models on? I haven't paid close attention to this with LLMs, but I've tried very hard to get non-deterministic behavior out of my training runs for other kinds of transformer models and was never able to on my 2080, 4090, or an A100. PyTorch docs have a note saying that in general it's impossible: https://docs.pytorch.org/docs/stable/notes/randomness.html
Inference on a generic LLM may not be subject to these non-determinisms even on a GPU though, idk
Ah. I've typically avoided CUDA except for a couple of really big jobs so I haven't noticed this.