Comment by bonoboTP
13 hours ago
It can be made deterministic. It's not trivial and can slow it down a bit (not much) but there are environment variables you can set to make your GPU computations bitwise reproducible. I have done this in training models with Pytorch.
There are settings to make it reproducible but they incur a non-negligible drop in performance.
Unsurprising given they amount to explicit synchronization to make the order of operations deterministic.