Comment by sgeisenh
5 months ago
This is an oversimplification. When distributed, the nondeterministic order of additions during reductions can produce nondeterministic results due to floating point error.
It’s nitpicking for sure, but it causes real challenges for reproducibility, especially during model training.
No comments yet
Contribute on Hacker News ↗