← Back to context

Comment by CamperBob2

1 month ago

The problem I see, over and over, is that people pose poorly-formed questions to the free ChatGPT and Google models, laugh at the resulting half-baked answers that are often full of errors and hallucinations, and draw conclusions about the technology as a whole.

Either that, or they tried it "last year" or "a while back" and have no concept of how far things have gone in the meantime.

It's like they wandered into a machine shop, cut off a finger or two, and concluded that their grandpa's hammer and hacksaw were all anyone ever needed.

No, frankly it's the difference between actual engineers and hobbyists/amateurs/non-SWEs.

SWEs are trained to discard surface-level observations and be adversarial. You can't just look at the happy path, how does the system behave for edge cases? Where does it break down and how? What are the failure modes?

The actual analogy to a machine shop would be to look at whether the machines were adequate for their use case, the building had enough reliable power to run and if there were any safety issues.

It's easy to Clever Hans yourself and get snowed by what looks like sophisticated effort or flat out bullshit. I had to gently tell a junior engineer that just because the marketing claims something will work a certain way, that doesn't mean it will.

  • What you’re describing is just competent engineering, and it’s already been applied to LLMs. People have been adversarial. That’s why we know so much about hallucinations, jailbreaks, distribution shift failures, and long-horizon breakdowns in the first place. If this were hobbyist awe, none of those benchmarks or red-teaming efforts would exist.

    The key point you’re missing is the type of failure. Search systems fail by not retrieving. Parrots fail by repeating. LLMs fail by producing internally coherent but factually wrong world models. That failure mode only exists if the system is actually modeling and reasoning, imperfectly. You don’t get that behavior from lookup or regurgitation.

    This shows up concretely in how errors scale. Ambiguity and multi-step inference increase hallucinations. Scaffolding, tools, and verification loops reduce them. Step-by-step reasoning helps. Grounding helps. None of that makes sense for a glorified Google search.

    Hallucinations are a real weakness, but they’re not evidence of absence of capability. They’re evidence of an incomplete reasoning system operating without sufficient constraints. Engineers don’t dismiss CNC machines because they crash bits. They map the envelope and design around it. That’s what’s happening here.

    Being skeptical of reliability in specific use cases is reasonable. Concluding from those failure modes that this is just Clever Hans is not adversarial engineering. It’s stopping one layer too early.

    • > If this were hobbyist awe, none of those benchmarks or red-teaming efforts would exist.

      Absolutely not true. I cannot express how strongly this is not true, haha. The tech is neat, and plenty of real computer scientists work on it. That doesn't mean it's not wildly misunderstood by others.

      > Concluding from those failure modes that this is just Clever Hans is not adversarial engineering.

      I feel like you're maybe misunderstanding what I mean when I refer to Clever Hans. The Clever Hans story is not about the horse. It's about the people.

      A lot of people -- including his owner-- were legitimately convinced that a horse could do math, because look, literally anyone can ask the horse questions and it answers them correctly. What more proof do you need? It's obvious he can do math.

      Except of course it's not true lol. Horses are smart critters, but they absolutely cannot do arithmetic no matter how much you train them.

      The relevant lesson here is it's very easy to convince yourself you saw something you 100% did not see. (It's why magic shows are fun.)

      7 replies →

  • You sound pretty certain. There's often good money to be made in taking the contrarian view, where you have insights that the so-called "smart money" lacks. What are some good investments to make in the extreme-bear case, in which we're all just Clever Hans-ing ourselves as you put it? Do you have skin in the game?

    • My dude, I assure you "humans are really good at convincing themselves of things that are not true" is a very, very well known fact. I don't know what kind of arbitrage you think exists in this incredibly anodyne statement lol.

      If you want a financial tip, don't short stock and chase market butterflies. Instead, make real professional friends, develop real skills and learn to be friendly and useful.

      I made my money in tech already, partially by being lucky and in the right place at the right time, and partially because I made my own luck by having friends who passed the opportunity along.

      Hope that helps!

      1 reply →

  • I wish there was a way to discern posts from legit clever people from the not-so.

    Its annoying to see posts from people who lag behind in intelligence and just dont get it - people learn at different rates. Some see way further ahead.

    • A good way to filter is for you to look in the mirror. Only the person in the mirror sees further ahead than anyone else.