← Back to context

Comment by TeMPOraL

15 days ago

> I'm in the "failure" camp, because the true correctness of an answer comes from how it was reached.

Something I may have believed until I got married. Now I know that "fnu cwken" obviously means "fresh broccoli, because what else could it mean, did I say something about buying chicken, obviously this is not chicken since I asked you to go to produce store and they DON'T SELL CHICKEN THERE".

Seriously though, I'm mostly on the side of "huge success" here, but LLMs sometimes really get overzealous with fixing what ain't broke.

I often think that LLM issues like this could be solved by a final pass of "is the information in this image the same as this text" (ie generally a verification pass).

It might be that you would want to use a different model {non-generative} for that last pass -- which is like the 'array of experts' type approach. Or comparing to your human analogy, like reading back the list to your partner before you leave for the shops.

Isn’t that just following through on “from how it was reached”? Without any of that additional information, if the LLM gave the same result, we should consider it the product of hallucination