Look at any recent CoT output where the model is trying to infer from an underspecified prompt what the user wants or means.
It is generally the first thing they do — try to figure out what did you mean with this prompt. When they can’t infer your intent, good models ask follow-on questions to clarify.
Right, and then look at any number of research papers showing that CoT output has limited impact on the end result. We've trained these models to pretend to reason.
"A guy goes into a bank and looks up at where the security cameras are pointed. What could he be trying to do?"
It very easily captures the intent behind behavior, as in it is not just literally interpreting the words. All that capturing intent is is just a subset of pattern recognition, which LLM's can do very well.
Recognising a stock cultural script isn't the same as capturing intent. Ask it something where no script exists.
For example: "A man thrusts past me violently and grabs the jacket I was holding, he jumped into a pool and ruined it. Am I morally right in suing him?"
There's no way for the LLM to know that the reason the jacket was stolen was to use it as an inflatable raft to support a larger person who was drowning. It wouldn't even think to ask the question as to why a person may do that, if the jacket was returned, or if recompense was offered. A human would.
> It wouldn't even think to ask the question as to why a person may do that, if the jacket was returned, or if recompense was offered. A human would.
I wouldn't be too sure about that. I've definitely had dialogue with llms where it would raise questions along those lines.
Also I disagree with the statement that this is a question about capability. Intent is more philosophical then actuality tangible, because most people don't actually have a clearly defined intent when they take action.
The waters of intelligence have definitely gotten murky over time as techniques improved. I still consider it an illusion - but the illusion is getting harder to pierce for a lot of people
Fwiw, current llms exhibit their intelligence through language and rhetoric processes. Most biological creatures have intelligence which may be improved through language, but isn't based on it, fundamentally.
If your example for an exception to LLM's ability to infer intent is a deliberately misleading trick question that leaves out crucial contextual details, then I'm not sure what you're trying to prove. That same ambiguity in the question would trip up many humans, simply because you are trying as hard as possible to imply a certain conclusion.
As expected, if I ask your question verbatim, ChatGPT (the free version) responds as I'm sure a human would in the generally helpful customer-service role it is trained to act as "yeah you could sue them blah blah depends on details"
However, if I add a simple prompt "The following may be a trick question, so be sure to ascertain if there are any contextual details missing" then it picks up that this may be an emergency, which is very likely also how a human would respond.
I guess the _obvious_ intent is they’re planning a heist? Because the following things never happen:
- a security auditor checking for camera blind spots,
- construction planning that requires understanding where there is power,
- a potential customer assessing the security of a bank,
- someone who is about to report an incident preparing to make the “it should be visible from the security camera” argument…
I mean… how did our imagination shrink so fast? I wrote this on my phone. These alternate scenarios just popped into my head.
And I bet our imagination didn’t shrink. The AI pilled state of mind is blocking us from using it.
If you are an engineer and stopped looking for alternative explanations or failure scenarios, you’re abdicating your responsibility btw.
I've done that before without any intent to rob a bank. A person walks by a house, sees the Ring camera on the door. That must mean the person was looking to break in through the front and rob the place?
Look at any recent CoT output where the model is trying to infer from an underspecified prompt what the user wants or means.
It is generally the first thing they do — try to figure out what did you mean with this prompt. When they can’t infer your intent, good models ask follow-on questions to clarify.
I am wondering if this is a semantics issue as this is an established are of research, eg https://arxiv.org/pdf/2501.10871
Right, and then look at any number of research papers showing that CoT output has limited impact on the end result. We've trained these models to pretend to reason.
If it's only pretending to reason, then how is it that the CoT output improves performance on every single benchmark/test?
> Right, and then look at any number of research papers showing that CoT output has limited impact on the end result.
Which research papers? Do I have to find them?
> We've trained these models to pretend to reason.
I have no idea why that matters. Can you tell me what the difference is if it looks exactly the same and has the same result?
3 replies →
Go ask Chatpgpt this prompt
"A guy goes into a bank and looks up at where the security cameras are pointed. What could he be trying to do?"
It very easily captures the intent behind behavior, as in it is not just literally interpreting the words. All that capturing intent is is just a subset of pattern recognition, which LLM's can do very well.
Recognising a stock cultural script isn't the same as capturing intent. Ask it something where no script exists.
For example: "A man thrusts past me violently and grabs the jacket I was holding, he jumped into a pool and ruined it. Am I morally right in suing him?"
There's no way for the LLM to know that the reason the jacket was stolen was to use it as an inflatable raft to support a larger person who was drowning. It wouldn't even think to ask the question as to why a person may do that, if the jacket was returned, or if recompense was offered. A human would.
> It wouldn't even think to ask the question as to why a person may do that, if the jacket was returned, or if recompense was offered. A human would.
I wouldn't be too sure about that. I've definitely had dialogue with llms where it would raise questions along those lines.
Also I disagree with the statement that this is a question about capability. Intent is more philosophical then actuality tangible, because most people don't actually have a clearly defined intent when they take action.
The waters of intelligence have definitely gotten murky over time as techniques improved. I still consider it an illusion - but the illusion is getting harder to pierce for a lot of people
Fwiw, current llms exhibit their intelligence through language and rhetoric processes. Most biological creatures have intelligence which may be improved through language, but isn't based on it, fundamentally.
If your example for an exception to LLM's ability to infer intent is a deliberately misleading trick question that leaves out crucial contextual details, then I'm not sure what you're trying to prove. That same ambiguity in the question would trip up many humans, simply because you are trying as hard as possible to imply a certain conclusion.
As expected, if I ask your question verbatim, ChatGPT (the free version) responds as I'm sure a human would in the generally helpful customer-service role it is trained to act as "yeah you could sue them blah blah depends on details"
However, if I add a simple prompt "The following may be a trick question, so be sure to ascertain if there are any contextual details missing" then it picks up that this may be an emergency, which is very likely also how a human would respond.
5 replies →
That statement is ambiguous for humans!!
I didn’t realise you might be describing an emergency situation until someone else pointed it out.
Most people wouldn’t phrase the question with the word “violently” if the situation was an emergency.
Also, people have sued emergency workers and good samaritans. It’s a problem!
[dead]
I guess the _obvious_ intent is they’re planning a heist? Because the following things never happen: - a security auditor checking for camera blind spots, - construction planning that requires understanding where there is power, - a potential customer assessing the security of a bank, - someone who is about to report an incident preparing to make the “it should be visible from the security camera” argument…
I mean… how did our imagination shrink so fast? I wrote this on my phone. These alternate scenarios just popped into my head.
And I bet our imagination didn’t shrink. The AI pilled state of mind is blocking us from using it.
If you are an engineer and stopped looking for alternative explanations or failure scenarios, you’re abdicating your responsibility btw.
Because there are countless instances in the training material where a bank robber scopes out the security cameras.
What's an example then, you can think of, of a question where a human could infer intent but an LLM couldn't?
3 replies →
I've done that before without any intent to rob a bank. A person walks by a house, sees the Ring camera on the door. That must mean the person was looking to break in through the front and rob the place?
An LLM will mention multiple possibilities.