Comment by themafia
16 hours ago
> we're not actually on the right track to achieve real intelligence.
Real intelligence means you have to say "I don't know" when you don't know, or ask for help, or even just saying you refuse to help with the subtext being you don't want to appear stupid.
The models could ostensibly do this when it has low confidence in it's own results but they don't. What I don't know if it's because it would be very computationally difficult or it would harm the reputation of the companies charging a good sum to use them.
That's just not how they work, really. They don't know what they don't know and their process requires an output.
I think they're getting better at it, but it's likely just the number of parameters getting bigger and bigger in the SOTA models more than anything.
They do know what they don't know. There's a probability distribution for outputs that they are sampling from. That just isn't being used for that purpose.
Common misconception. As far we know, LLMs are not calibrated, i.e. their output "probabilities" are not in fact necessarily correlated with the actual error rates, so you can't use e.g. the softmax values to estimate confidence. It is why it is more accurate to talk about e.g. the model "logits", "softmax values", "simplex mapping", "pseudo-probabilities", or even more agnostically, just "output scores", unless you actually have strong evidence of calibration.
To get calibrated probabilities, you actually need to use calibration techniques, and it is extremely unclear if any frontier models are doing this (or even how calibration can be done effectively in fancy chain-of-thought + MoE models, and/or how to do this in RLVR and RLHF based training regimes). I suppose if you get into things like conformal prediction, you could ensure some calibration, but this is likely too computationally expensive and/or has other undesirable side-effects.
EDIT: Oh and also there are anomaly detection approaches, which attempt to identify when we are in outlier space based on various (e.g. distance) metrics based on the embeddings, but even getting actual probabilities here is tricky. This is why it is so hard to get models to say they "don't know" with any kind of statistical certainty, because that information isn't generally actually "there" in the model, in any clean sense.
11 replies →
I’m not clear what you mean by “know.” If you mean “the information is in the model” then I mostly agree, distributional information is represented somewhere. But if you mean that a model can actually access this information in a meaningful and accurate way—say, to state its confidence level—I don’t think that’s true. There is a stochastic process sampling from those distributions, but can the process introspect? That would be a very surprising capability.
1 reply →
Having a probability distribution to sample from is not the same thing is knowing, because they don’t know anything about the provenance of the data that was used to build the distribution. They trust their training set implicitly by construction. They have no means to detect systematic errors in their training set.
2 replies →
Well, with thinking models, it’s not that simple. The probability distribution is next token. But if a model thinks to produce an answer, you can have a high confidence next token even if MCMC sampling the model’s thinking chain would reveal that the real probability distribution had low confidence.
Oh, you mean somewhere it is tracking the statistical likelihood of the output. Yeah I buy that, although I think it just tends towards the most likely output given the context that it is dragging along. I mean it wouldn’t deliberately choose something really statistically unlikely, that’s like a non sequitur.
2 replies →
> Real intelligence means you have to say "I don't know" when you don't know
I have met many supposedly intelligent, certainly high status, humans who don't appear to be able to do that either.
I have more confidence we can train AIs to do it, honestly.
While it is true that there are people who do not admit they are wrong when they factually are, your assertion glosses over the fact that most of the people we maintain in our social circle are people we trust through our experiences with them to be honest.
My theory is because the people building the models and in charge of directing where they go love the sycophantic yes-man behavior the models display
They don't like hearing "I don't know"
You can TELL the models to do this and they'll follow your prompt.
"Give me your answer and rate each part of it for certainty by percentage" or similar.
could you please tell me how it generates that certainty score?
Vibes.
The whole thing is a statistical model, that's just what it is. No, I cannot in a reasonable way dissect how an LLM works to a satisfactory level to a skeptic.
7 replies →
You can just tell the agent to do exactly that
I've had various agents backed by various models ignore the shit out of various rules and request at varying rates but they all do it.
When you point it out "Oh yes, I did do that which is contrary to the rules, request <whatever>.. Anyway..."
If you are on a sota model and your context window is less than 100k tokens and you don't have any vague or contradicting rules, then I've almost never seen a rule broken
The most common failure I've seen come from tools that pollute their context with crap and the llm will forget stuff or just get confused from all the irrelevant sentences; which if the report is true, is probably what these ai notetakers are guilty of. This problem gets exacerbated if these tools turn on the 1M context window version.
1 reply →
Except you can't be sure it isn't producing nonsense when you do this, and generally the model(s) will be overconfident. This has been studied, see e.g. https://openreview.net/pdf?id=E6LOh5vz5x
>You can just tell the agent to do exactly that
You can.
It just won't do it.
Seems to work for me
https://chatgpt.com/share/6a06a4c5-d454-83e8-a5b2-c9468f6588...