Comment by msgodel
2 days ago
I run my local LLMs with a seed of one. If I re-run my "ai" command (which starts a conversation with its parameters as a prompt) I get exactly the same output every single time.
2 days ago
I run my local LLMs with a seed of one. If I re-run my "ai" command (which starts a conversation with its parameters as a prompt) I get exactly the same output every single time.
In my (poor) understanding, this can depend on hardware details. What are you running your models on? I haven't paid close attention to this with LLMs, but I've tried very hard to get non-deterministic behavior out of my training runs for other kinds of transformer models and was never able to on my 2080, 4090, or an A100. PyTorch docs have a note saying that in general it's impossible: https://docs.pytorch.org/docs/stable/notes/randomness.html
Inference on a generic LLM may not be subject to these non-determinisms even on a GPU though, idk
Ah. I've typically avoided CUDA except for a couple of really big jobs so I haven't noticed this.
Yes. This is what I was trying to say. Saying "It’s worth noting that LLMs are non-deterministic" is wrong and should be changed in the blog post.
> Saying "It’s worth noting that LLMs are non-deterministic" is wrong and should be changed in the blog post.
Every person in this thread understood that Simon meant "Grok, ChatGPT, and other common LLM interfaces run with a temperature>0 by default, and thus non-deterministically produce different outputs for the same query".
Sure, he wrote a shorter version of that, and because of that y'all can split hairs on the details ("yes it's correct for how most people interact with LLMs and for grok, but _technically_ it's not correct").
The point of English blog posts is not to be a long wall of logical prepositions, it's to convey ideas and information. The current wording seems fine to me.
The point of what he was saying was to caution readers "you might not get this if you try to repro it", and that is 100% correct.
Still, the statement that LLMs are non-deterministic is incorrect and could mislead some people who simply aren't familiar with how they work.
Better phrasing would be something like "It's worth noting that LLM products are typically operated in a manner that produces non-deterministic output for the user"
2 replies →
My temperature is set higher than zero as well. That doesn't make them nondeterministic.
1 reply →
You’re correct in batch size 1 (local is one), but not in production use case when multiple requests get batched together (and that’s how all the providers do this).
With batching matrix shapes/request position in them aren’t deterministic and this leads to non deterministic results, regardless of sampling temperature/seed.
Isn't that true only if the batches are different? If you run exactly the same batch, you're back to a deterministic result.
If I had a black box api, just because you don't know how it's calculated doesn't mean that it's non-deterministic. It's the underlaying algorithm that determines that and a LLM is deterministic.
1 reply →
"Non-deterministic" in the sense that a dice roll is when you don't know every parameter with ultimate precision. On one hand I find insistence on the wrongness on the phrase a bit too OCD, on the other I must agree that a very simple re-phrasing like "appears {non-deterministic|random|unpredictable} to an outside observer" would've maybe even added value even for less technically-inclined folks, so yeah.