Comment by Hendrikto

2 months ago

Because these models are context-sensitive. Every token can influence the output.

6 comments

Hendrikto

But not the tokens that don't even feed into your output because they're feeding into someone else's output. Separate items in batches don't get mixed up with each other - they just run the model separately on each item at the same time, like SIMD.

simianwords 2 months ago

I believe they are talking about latency variance. Batching can increase variance because you may have to wait for enough prompts to get to the batch size.

perching_aix 2 months ago
No, I meant that the responses will be different run-to-run. [0]
[0] https://152334h.github.io/blog/non-determinism-in-gpt-4/
- exe34 2 months ago
  
  Variance based on actual randomness would be one thing, but to me variance based on what other people are running seems concerning, for reasons I can't quite articulate. I don't want the model to reply to a question in one domain based on what a large group of other people are thinking in a different domain (e.g. if they're discussing the news with chatgpt).
  
  2 replies →