← Back to context

Comment by smallerize

9 hours ago

I think the trouble is in the outputs of the LLM and how it's interpreted by the tooling. The output is a distribution of probabilities of all possible next tokens. Even if the probability of every token is very low, the output gets normalized so that the sum of all probabilities is 1. So after that step, it's hard to see if the model was strongly preferring certain tokens or if you're just looking at amplified noise.

Training an extra "don't know" token means you have to build a moat between every other token. Between "yes" and "no", you don't have a muddled noisy area where both "yes" and "no" have relatively high probabilities, you need a new peak where "don't know" is higher. Then you just have new muddled areas between "yes" and "don't know", and "don't know" and "no". That requires even more finesse to train another answer in between.

Instead, you could check whether multiple options are about equally likely. But then you have to check if they are actually synonyms, like are the top two choices "Genève" and "Geneva", which is a good sign that the model knows the answer? Or are the top two "yes" and "no"?