Comment by threethirtytwo

1 month ago

What you’re describing is just competent engineering, and it’s already been applied to LLMs. People have been adversarial. That’s why we know so much about hallucinations, jailbreaks, distribution shift failures, and long-horizon breakdowns in the first place. If this were hobbyist awe, none of those benchmarks or red-teaming efforts would exist.

The key point you’re missing is the type of failure. Search systems fail by not retrieving. Parrots fail by repeating. LLMs fail by producing internally coherent but factually wrong world models. That failure mode only exists if the system is actually modeling and reasoning, imperfectly. You don’t get that behavior from lookup or regurgitation.

This shows up concretely in how errors scale. Ambiguity and multi-step inference increase hallucinations. Scaffolding, tools, and verification loops reduce them. Step-by-step reasoning helps. Grounding helps. None of that makes sense for a glorified Google search.

Hallucinations are a real weakness, but they’re not evidence of absence of capability. They’re evidence of an incomplete reasoning system operating without sufficient constraints. Engineers don’t dismiss CNC machines because they crash bits. They map the envelope and design around it. That’s what’s happening here.

Being skeptical of reliability in specific use cases is reasonable. Concluding from those failure modes that this is just Clever Hans is not adversarial engineering. It’s stopping one layer too early.

8 comments

threethirtytwo

habinero 1 month ago

> If this were hobbyist awe, none of those benchmarks or red-teaming efforts would exist.

Absolutely not true. I cannot express how strongly this is not true, haha. The tech is neat, and plenty of real computer scientists work on it. That doesn't mean it's not wildly misunderstood by others.

> Concluding from those failure modes that this is just Clever Hans is not adversarial engineering.

I feel like you're maybe misunderstanding what I mean when I refer to Clever Hans. The Clever Hans story is not about the horse. It's about the people.

A lot of people -- including his owner-- were legitimately convinced that a horse could do math, because look, literally anyone can ask the horse questions and it answers them correctly. What more proof do you need? It's obvious he can do math.

Except of course it's not true lol. Horses are smart critters, but they absolutely cannot do arithmetic no matter how much you train them.

The relevant lesson here is it's very easy to convince yourself you saw something you 100% did not see. (It's why magic shows are fun.)

CamperBob2 1 month ago
Except of course it's not true lol. Horses are smart critters, but they absolutely cannot do arithmetic no matter how much you train them.
These things are not horses. How can anyone choose to remain so ignorant in the face of irrefutable evidence that they're wrong?
https://arxiv.org/abs/2507.15855
It's as if a disease like COVID swept through the population, and every human's IQ dropped 10 to 15 points while our machines grew smarter to an even larger degree.
- habinero 1 month ago
  
  Or -- and hear me out -- that result doesn't mean what you think it does.
  That's the exact reason I mention the Clever Hans story. You think it's obvious because you can't come up with any other explanation, therefore there can't be another explanation and the horse must be able to do math. And if I can't come up with an explanation, well that just proves it, right? Those are the only two options, obviously.
  Except no, all it means is you're the limiting factor. This isn't science 101 but maybe science 201?
  My current hypothesis is the IMO thing gets trotted out mostly by people who aren't strong at math. They find the math inexplicable, therefore it's impressive, therefore machine thinky.
  When you actually look hard at what's claimed in these papers -- and I've done this for a number of these self-published things -- the evidence frequently does not support the conclusions. Have you actually read the paper, or are you just waving it around?
  At any rate, I'm not shocked that an LLM can cobble together what looks like a reasonable proof for some things sometimes, especially for the IMO which is not novel math and has a range of question difficulties. Proofs are pretty code-like and math itself is just a language for concisely expressing ideas.
  Here, let me call a shot -- I bet this paper says LLMs fuck up on proofs like they fuck up on code. It will sometimes generate things that are fine, but it'll frequently generate things that are just irrational garbage.
  
  2 replies →
- habinero 1 month ago
  
  (Continuing from my other post)
  The first thing I checked was "how did they verify the proofs were correct" and the answer was they got other AI people to check it, and those people said there were serious problems with the paper's methodology and it would not be a gold medal.
  https://x.com/j_dekoninck/status/1947587647616004583
  This is why we do not take things at face value.
  
  1 reply →
threethirtytwo 1 month ago

You’re leaning very hard on the Clever Hans story, but you’re still missing why the analogy fails in a way that should matter to an engineer.
Clever Hans was exposed because the effect disappeared under controlled conditions. Blind the observers, remove human cues, and the behavior vanished. The entire lesson of Clever Hans is not “people can fool themselves,” it’s “remove the hidden channel and see if the effect survives.” That test is exactly what has been done here, repeatedly.
LLM capability does not disappear when you remove human feedback. It does not disappear under automatic evaluation. It does not disappear across domains, prompts, or tasks the model was never trained or rewarded on. In fact, many of the strongest demonstrations people point to are ones where no human is in the loop at all: program synthesis benchmarks, math solvers, code execution tasks, multi-step planning with tool APIs, compiler error fixing, protocol following. These are not magic tricks performed for an audience. They are mechanically checkable outcomes.
Your framing quietly swaps “some people misunderstand the tech” for “therefore the tech itself is misunderstood in kind.” That’s a rhetorical move, not an argument. Yes, lots of people are confused. That has no bearing on whether the system internally models structure or just parrots. The horse didn’t suddenly keep solving arithmetic when the cues were removed. These systems do.
The “it’s about the people” point also cuts the wrong way. In Clever Hans, experts were convinced until adversarial controls were applied. With LLMs, the more adversarial the evaluation gets, the clearer the internal structure becomes. The failure modes sharpen. You start seeing confidence calibration errors, missing constraints, reasoning depth limits, and brittleness under distribution shift. Those are not illusions created by observers. They’re properties of the system under stress.
You’re also glossing over a key asymmetry. Hans never generalized. He didn’t get better at new tasks with minor scaffolding. He didn’t improve when the problem was decomposed. He didn’t degrade gracefully as difficulty increased. LLMs do all of these things, and in ways that correlate with architectural changes and training regimes. That’s not how self-deception looks. That’s how systems with internal representations behave.
I’ll be blunt but polite here: invoking Clever Hans at this stage is not adversarial rigor, it’s a reflex. It’s what you reach for when something feels too capable to be comfortable but you don’t have a concrete failure mechanism to point at. Engineers don’t stop at “people can be fooled.” They ask “what happens when I remove the channel that could be doing the fooling?” That experiment has already been run.
If your claim is “LLMs are unreliable for certain classes of problems,” that’s true and boring. If your claim is “this is all an illusion caused by human pattern-matching,” then you need to explain why the illusion survives automated checks, blind evaluation, distribution shift, and tool-mediated execution. Until then, the Hans analogy isn’t skeptical. It’s nostalgic.