Comment by moi2388
6 hours ago
“ the researchers created a carefully controlled LLM environment in an attempt to measure just how well chain-of-thought reasoning works when presented with "out of domain" logical problems that don't match the specific logical patterns found in their training data.”
Why? If it’s out of domain we know it’ll fail.
I don't think we know that it'll fail, or at least that is not universally accepted as true. Rather, there are claims that given a large enough model / context window, such capabilities emerge. I think skepticism of that claim is warranted. This research validates that skepticism, at least for a certain parameters (model family/size, context size, etc).
> Why? If it’s out of domain we know it’ll fail.
To see if LLMs adhere to logic or observed "logical" responses are rather reproduction of patterns.
I personally enjoy this idea of isolation "logic" from "pattern" and seeing if "logic" will manifest in LLM "thinking" about in "non-patternized" domain.
--
Also it's never bad give proves to public that "thinking" (like "intelligence") in AI context isn't the same thing we think about intuitively.
--
> If it’s out of domain we know it’ll fail.
Below goes question which is out of domain. Yet LLMs handle the replies in what appearing as logical way.
``` Kookers are blight. And shmakers are sin. If peker is blight and sin who is he? ```
It is out of domain and it does not fail (I've put it through thinking gemini 2.5). Now back to article. Is observed logic intristic to LLMs or it's an elaborate form of a pattern? Acoording to article it's a pattern.
Its getting to the nub of whether models can extrapolate instead of interpolate.
If they had _succeeded_, we'd all be taking it as proof that LLMs can reason, right?