Comment by kouteiheika

2 months ago

> Why would batching lead to variance?

Depending on the shape of the data a slightly different kernel implementation (for e.g. matrix multiplication, etc.) will be the most optimal, and those will give slightly different results. There could also be other sources of non-determinism depending on the implementation (e.g. some kernels are inherently not entirely deterministic as they use tricks to go faster).

Yep, this. I see a lot of other worryingly confident answers in the thread that are wrong.

SGLang finally has at least some notes[0], but I’m always surprised there isn’t more of a community wide effort to trace down the sources of indeterminism.

[0] https://docs.sglang.ai/references/faq.html

> not entirely deterministic

There's a Nobel prize waiting for you if that's the case. I'll assume you meant theoretically consistent or accurate.

Some of the non-determinism mentioned above manifests as sensitivity to _where_ data falls within a batch.

  • In my experience with other regular models, once the context starts to fill up, quality starts to degrade.

    wouldn't getting batched at the end of a batch, have a similar -effect- on the results, where your prompt might recieve overall less attention focused into it, if the context window is almost full?

    Idk just going by the vibes