Comment by PaulHoule

4 days ago

The real problem with LLMs is that you can't get a probability estimate out of "Is {sentence_a} a plausible answer to {sentence_b}?"

See https://www.sbert.net/examples/applications/cross-encoder/RE...

With an open model, you could probably reverse engineer the token probabilities and get that probability estimate.

Something like: "Is {sentence_a} a plausible answer to {sentence_b}? Respond only with a single yes/no token" and then look at the probabilities of those.

  • If the model is not open turn up the temperature a bit (if the API allows that) and ask the above question multiple times. The less sure the model is the more the answer will vary.

Of course one can just ask the LLM for the output probability. It will give a reasonably calibrated output, typically a multiple of 0.05. I would ask it for an integer percentage though.