Comment by minimaltom

1 month ago

Its not deterministic. Any individual floating point mul/add is deterministic, but in a GPU these are all happening in parallel and the accumulation is in the order they happen to complete.

When you add A then B then C, you get a different answer than C then A then B, because floating point, approximation error, subnormals etc.

2 comments

minimaltom

bonoboTP 1 month ago

It can be made deterministic. It's not trivial and can slow it down a bit (not much) but there are environment variables you can set to make your GPU computations bitwise reproducible. I have done this in training models with Pytorch.

minimaltom 1 month ago

There are settings to make it reproducible but they incur a non-negligible drop in performance.
Unsurprising given they amount to explicit synchronization to make the order of operations deterministic.