Comment by NitpickLawyer

7 months ago

It gets more complicated with things like batch processing. Depending on where in the stack your query gets placed, and how the underlying hardware works, and how the software stack was implemented, you might get small differences that get compounded over many token generations. (vLLM - a popular inference engine, has this problem as well).

0 comments