← Back to context

Comment by vlovich123

1 day ago

> The problem with the former (output) is that you cannot guarantee the output of an AI on a consistent basis

Do you mean you cannot guarantee the result based on a task request with a random query? Or something else? I was under the impression that LLMs are very deterministic if you provide a fixed seed for the samplers, fixed model weights, and fixed context. In cloud providers you can't guarantee this because of how they implement this (batching unrelated requests together and doing math). Now you can't guarantee the quality of the result from that and changing the seed or context can result in drastically different quality. But maybe you really mean non-deterministic but I'm curious where this non-determinism would come from.

> I was under the impression that LLMs are very deterministic if you provide a fixed seed for the samplers, fixed model weights, and fixed context.

That's all input-side, though. On the output side, you can essentially give an LLM anxiety by asking the exact same question in different ways, and the machine doesn't understand anymore that you're asking the exact same question.

For instance, take one of these fancy "reasoning" models and ask it variations on 2+2. Try two plus two, 2 plus two, deux plus 2, TwO pLuS 2, etc, and observe its "reasoning" outputs to see the knots it ties itself up in trying to understand why you keep asking the same calculation over and over again. Running an older DeepSeek model locally, the "reasoning" portion continued growing in time and tokens as it struggled to provide context that didn't exist to a simple problem that older/pre-AI models wouldn't bat an eye at and spit out "4".

Trying to wrangle consistent, reproducible outputs from LLMs without guaranteeing consistent inputs is a fool's errand.

  • Ok yes. I call that robustness of the model as opposed to determinism which to me implies different properties. And yes, I too have been frustrated by the lack of robustness of models to minor variations in input or even using a different seed for the same input.

Pointing out that LLMs are deterministic as long as you lock down everything, is like saying an extra bouncy ball doesn’t bounce if you leave it on flat surface, reduce the temperature to absolute zero, and make sure the surface and the ball are at rest before starting the experiment.

It’s true but irrelevant.

One of the GP’s main points was that even the simplest questions can lead to hundreds of different contexts; they probably already know that you could get different outcomes if you could instead have a fixed context.