Comment by veunes
5 hours ago
Even if you pin the seed and spin up your own local LLM, changes to continuous batching at the vLLM level or just a different CUDA driver version will completely break your bitwise float convergence. Reproducibility in ML generation is a total myth, in prod we only work with the final output anyway
No comments yet
Contribute on Hacker News ↗