Comment by sigmoid10
9 hours ago
That is not a problem for LLMs, because in practice floating point inaccuracies (in particular after exponentiation) prevent values from being exactly equal. That's why greedy sampling generally produces deterministic output for LLMs. The real gotchas are elsewhere (like with batch inference as we've seen with earlier GPTs). But unlike what the earlier comment says, this is a non-issue mathematically.
> That is not a problem for LLMs, because in practice floating point inaccuracies (in particular after exponentiation) prevent values from being exactly equal
Any two tokens ending up with the exact same logit is very unlikely, but not impossible; and as the number of output tokens grows, the odds that it will happen eventually gets higher and higher.
I suppose, to ensure determinism, rank by logit then token ID, so you still have a deterministic winner even if occasionally two tokens get precisely identical logits.
You aren't looking for a random set of tokens that have the exact same logit, you are looking for the largest n tokens to have the exact same probability.
This is exceedingly unlikely, as training will only push one of them up for any individual sample. There are likely some pathological situations that could end up with that situation, maybe, but it is pretty unlikely in a general case.
"Makes unlikely" is very different from "prevents."
If there's one counterexample, it's not really deterministic.
Exactly, consider the scenario where laws are at play and violating them could cost companies thousands. Recently my father received a 'request for address' letter addressed to me at his nursing home, the building has always been a nursing home, and he's also in his mid-70s. That's very obviously a violation of the Fair Debt Collection Practices Act. Imagine the implication of this if the law firm in questions used an AI-assisted data enriching product to find this information. That SaaS company is not only liable to that one law firm but every law firm who uses their software. Its potentially a federal class action lawsuit.
My point is, deterministic logic matters in certain circumstances 100% of the time. Forcing the LLM to make something unlikely is not good enough because a series of mistakes could very quickly bankrupt the company.
>My point is, deterministic logic matters in certain circumstances 100% of the time. Forcing the LLM to make something unlikely is not good enough because a series of mistakes could very quickly bankrupt the company.
If your argument is that the danger of equal values being selected inconsistently breaks determinism, that's a trivial problem to solve.
Any non-infinite precision numbering system by definition is at the limits of it's precision when equal values occur. If you need to order such values you can extend the precision and add on a deterministically unique tiny value (position, order encountered, etc.) . Your original value stays in the same precision range but they are now unique.
It's usually more likely that you want to sacrifice a little precision for determinism so you can quantise to allocate the range where you apply the unique ID
For example if you had an array of 256 fp32 values but you required them to be unique, you can lop off 8 bits of mantissa and replace it with its index in the array, Every value is then unique.
Granted token dictionaries make for some fairly hefty indexes now, but the principle applies in general, it's easily solvable if you are prepared to spend some precision or do some extra calculation.
> for LLMs, because in practice floating point inaccuracies (in particular after exponentiation) prevent values from being exactly equal.
In one thinking trace of 10k tokens, with fp16 or bf16 logits, I don't reckon a collision is rare? There are only 65k floating point numbers with that accuracy. And an agent can quickly rake up 100k tokens, so while not every token will have such a collision of equiprobable logits, it is not rare.