Comment by themafia

16 hours ago

> we're not actually on the right track to achieve real intelligence.

Real intelligence means you have to say "I don't know" when you don't know, or ask for help, or even just saying you refuse to help with the subtext being you don't want to appear stupid.

The models could ostensibly do this when it has low confidence in it's own results but they don't. What I don't know if it's because it would be very computationally difficult or it would harm the reputation of the companies charging a good sum to use them.

44 comments

themafia

cmrdporcupine 15 hours ago

That's just not how they work, really. They don't know what they don't know and their process requires an output.

I think they're getting better at it, but it's likely just the number of parameters getting bigger and bigger in the SOTA models more than anything.

adastra22 15 hours ago
They do know what they don't know. There's a probability distribution for outputs that they are sampling from. That just isn't being used for that purpose.
- D-Machine 11 hours ago
  
  Common misconception. As far we know, LLMs are not calibrated, i.e. their output "probabilities" are not in fact necessarily correlated with the actual error rates, so you can't use e.g. the softmax values to estimate confidence. It is why it is more accurate to talk about e.g. the model "logits", "softmax values", "simplex mapping", "pseudo-probabilities", or even more agnostically, just "output scores", unless you actually have strong evidence of calibration.
  To get calibrated probabilities, you actually need to use calibration techniques, and it is extremely unclear if any frontier models are doing this (or even how calibration can be done effectively in fancy chain-of-thought + MoE models, and/or how to do this in RLVR and RLHF based training regimes). I suppose if you get into things like conformal prediction, you could ensure some calibration, but this is likely too computationally expensive and/or has other undesirable side-effects.
  EDIT: Oh and also there are anomaly detection approaches, which attempt to identify when we are in outlier space based on various (e.g. distance) metrics based on the embeddings, but even getting actual probabilities here is tricky. This is why it is so hard to get models to say they "don't know" with any kind of statistical certainty, because that information isn't generally actually "there" in the model, in any clean sense.
  
  11 replies →
- raddan 14 hours ago
  
  I’m not clear what you mean by “know.” If you mean “the information is in the model” then I mostly agree, distributional information is represented somewhere. But if you mean that a model can actually access this information in a meaningful and accurate way—say, to state its confidence level—I don’t think that’s true. There is a stochastic process sampling from those distributions, but can the process introspect? That would be a very surprising capability.
  
  1 reply →
- chongli 13 hours ago
  
  Having a probability distribution to sample from is not the same thing is knowing, because they don’t know anything about the provenance of the data that was used to build the distribution. They trust their training set implicitly by construction. They have no means to detect systematic errors in their training set.
  
  2 replies →
- dhampi 14 hours ago
  
  Well, with thinking models, it’s not that simple. The probability distribution is next token. But if a model thinks to produce an answer, you can have a high confidence next token even if MCMC sampling the model’s thinking chain would reveal that the real probability distribution had low confidence.
- Isamu 14 hours ago
  
  Oh, you mean somewhere it is tracking the statistical likelihood of the output. Yeah I buy that, although I think it just tends towards the most likely output given the context that it is dragging along. I mean it wouldn’t deliberately choose something really statistically unlikely, that’s like a non sequitur.
  
  2 replies →

vintagedave 7 hours ago

> Real intelligence means you have to say "I don't know" when you don't know

I have met many supposedly intelligent, certainly high status, humans who don't appear to be able to do that either.

I have more confidence we can train AIs to do it, honestly.

FatherOfCurses 3 hours ago

While it is true that there are people who do not admit they are wrong when they factually are, your assertion glosses over the fact that most of the people we maintain in our social circle are people we trust through our experiences with them to be honest.

bluefirebrand 16 hours ago

My theory is because the people building the models and in charge of directing where they go love the sycophantic yes-man behavior the models display

They don't like hearing "I don't know"

colechristensen 15 hours ago

You can TELL the models to do this and they'll follow your prompt.

"Give me your answer and rate each part of it for certainty by percentage" or similar.

mylifeandtimes 15 hours ago
could you please tell me how it generates that certainty score?
- adastra22 15 hours ago
  
  Vibes.
- colechristensen 15 hours ago
  
  The whole thing is a statistical model, that's just what it is. No, I cannot in a reasonable way dissect how an LLM works to a satisfactory level to a skeptic.
  
  7 replies →

wagwang 15 hours ago

You can just tell the agent to do exactly that

tempest_ 14 hours ago
I've had various agents backed by various models ignore the shit out of various rules and request at varying rates but they all do it.
When you point it out "Oh yes, I did do that which is contrary to the rules, request <whatever>.. Anyway..."
- wagwang 12 hours ago
  
  If you are on a sota model and your context window is less than 100k tokens and you don't have any vague or contradicting rules, then I've almost never seen a rule broken
  The most common failure I've seen come from tools that pollute their context with crap and the llm will forget stuff or just get confused from all the irrelevant sentences; which if the report is true, is probably what these ai notetakers are guilty of. This problem gets exacerbated if these tools turn on the 1M context window version.
  
  1 reply →

D-Machine 10 hours ago

Except you can't be sure it isn't producing nonsense when you do this, and generally the model(s) will be overconfident. This has been studied, see e.g. https://openreview.net/pdf?id=E6LOh5vz5x

    > An alternative way to obtain uncertainty estimates from LLMs is to prompt them directly. One benefit of this approach is that it requires no access to the internals of the model. However, this approach has produced mixed results: LLMs can sometimes verbalize calibrated confidence levels (Lin et al., 2022a; Tian et al., 2023), but can also be highly overconfident (Xiong et al., 2024). Interestingly, Xiong et al. (2024) found that LLMs typically state confidence values in the range of 80-100%, usually in multiples of 5, potentially in imitation of how humans discuss confidence levels. Nevertheless, prompting strategies remain an important tool for uncertainty quantification, along with measures based on the internal state (such as MSP).

alterom 14 hours ago
>You can just tell the agent to do exactly that
You can.
It just won't do it.
- wagwang 12 hours ago
  
  Seems to work for me
  https://chatgpt.com/share/6a06a4c5-d454-83e8-a5b2-c9468f6588...