Comment by stingraycharles

6 months ago

Means the temperature should be set to 0 (which not every provider supports) so that the output becomes entirely deterministic. Right now with most models if you give the same input prompt twice it will give two different solutions.

8 comments

stingraycharles

NitpickLawyer 6 months ago

Even at temp 0, you might get different answers, depending on your inference engine. There might be hardware differences, as well as software issues (e.g. vLLM documents this, if you're using batching, you might get different answers depending on where in the batch sequence your query landed).

weird-eye-issue 6 months ago

Claude Code already uses a temperature of 0 (just inspect the requests) but it's not deterministic

Not to mention it also performs web searches, web fetching etc which would also make it not deterministic

singhrac 6 months ago

Production inference is not deterministic because of sharding (i.e. parameter weights on several GPUs on the same machine or MoE), timing-based kernel choices (e.g. torch.backends.cudnn.benchmark), or batched routing in MoEs. Probably best to host a small model yourself.

derwiki 6 months ago

Two years ago when I was working on this at a startup, setting OAI models’ temp to 0 still didn’t make them deterministic. Has that changed?

afiori 6 months ago

Do LLMs inference engines have a way to seed their randomness? so tho have reproducible outputs with still some variance if desired?

bavell 6 months ago

Yes, although it's not always exposed to the end user of LLM providers.

fastball 6 months ago

I would only care about more deterministic output if I was repeating the same process with the same model, which is not the point of the exercise.

shthed 6 months ago

This is good: run it n times, have the model review them and pick the best one.