← Back to context

Comment by Terr_

15 days ago

I'm in the "failure" camp, because the true correctness of an answer comes from how it was reached. [0]

The correct (or at least humanly-expected) process would be to identify the presence of mangled word, determine what its missing suffixes could have been, and if some candidate is a clear contextual winner (e.g. "fried chicken" not "dried chicken") use that.

However I wouldn't be surprised if the LLM is doing something like "The OCR data is X. Repeat to me what the OCR data is." That same process could also corrupt things, because it's a license to rewrite anything to look more like its training data.

[0] If that's not true, then it means I must have a supernatural ability to see into the future and correctly determine the result of a coin toss in advance. Sure, the power only works 50% of the time, but you should still worship me for being a major leap in human development. :p

> I'm in the "failure" camp, because the true correctness of an answer comes from how it was reached.

Something I may have believed until I got married. Now I know that "fnu cwken" obviously means "fresh broccoli, because what else could it mean, did I say something about buying chicken, obviously this is not chicken since I asked you to go to produce store and they DON'T SELL CHICKEN THERE".

Seriously though, I'm mostly on the side of "huge success" here, but LLMs sometimes really get overzealous with fixing what ain't broke.

  • I often think that LLM issues like this could be solved by a final pass of "is the information in this image the same as this text" (ie generally a verification pass).

    It might be that you would want to use a different model {non-generative} for that last pass -- which is like the 'array of experts' type approach. Or comparing to your human analogy, like reading back the list to your partner before you leave for the shops.

  • Isn’t that just following through on “from how it was reached”? Without any of that additional information, if the LLM gave the same result, we should consider it the product of hallucination

On your epistemology, if you correctly guess the outcome of a random event then the statement, even if contingent on an event that did not yet occur, is still true. The same goes for every incorrect guess.

If you claim that you guess correctly 50% of the time then you are, from a Bayesian perspective, starting with a reasonable prior.

You then conflate the usefulness of some guessing skill with logic and statistics.

How this relates to an LLM is that the priors are baked into the LLM so statistics is all that is required to make an educated guess about the contents of a poorly written grocery list. The truthfulness of this guess is contingent on events outside of the scope of the LLM.

How often, applying a scalar value to the statistical outcome of an event, is very important. If your claim is that LLMs are wrong 5O% of the time then you need to update your priors based on some actual experience.

To consider: do we overestimate what we know about how we humans reach an answer? (Humans are very capable of intuitively reading scrambled text, for example, as long as the beginning and ending of each word remains correct.)

The correct way to handle it is to ask the user if it's not clear, like a real assistant would