← Back to context

Comment by sobellian

9 hours ago

Hey man, it sounds like you're getting frustrated. I'm not ignoring anything; let's have a reasonable discussion without calling each other ignorant. I don't dispute the value of these tools nor that they're improving. But the no free lunch theorem is inexorable so the question is where this improvement breaks down - before or beyond human performance on programming problems specifically.

What difference do I think there is between humans and an agent? They use different heuristics, clearly. Different heuristics are valuable on different search problems. It's really that simple.

To be clear, I'm not calling either superior. I use agents every day. But I have noticed that claude, a SOTA model, makes basic logic errors. Isn't that interesting? It has access to the complete compendium of human knowledge and can code all sorts of things in seconds that require my trawling through endless documentation. But sometimes it forgets that to do dirty tracking on a pure function's output, it needs to dirty-track the function's inputs.

It's interesting that you mention AlphaGo. I was also very fascinated with it. There was recent research that the same algorithm cannot learn Nim: https://arstechnica.com/ai/2026/03/figuring-out-why-ais-get-.... Isn't that food for thought?

What is unreasonable? I am saying the claims you are making are completely contradicted by the literature. I am calling you ignorant in the technical sense, not dumb or unintelligent, and I don't mean this as an insult. I am completely ignorant of many things, we all are.

I am saying you are absolutely right that Opus 4.6 is both SOTA and also colossally terrible in even surprisingly mundane contexts. But that is just not relevant to the argument you are making which is that there is some fundamental limitation. There is of course always a fundamental limitation to everything, but what we're getting at is where that fundamental limitation is and we are not yet even beginning to see it. Combinatorics here is the wrong lens to look at this, because it's not doing a search over the full combinatoric space, as is the case with us. There are plenty of efficient search "heuristics" as you call them.

> They use different heuristics, clearly.

what is the evidence for this? I don't see that as true, take for instance: https://www.nature.com/articles/s42256-025-01072-0

> It's interesting that you mention AlphaGo. I was also very fascinated with it. There was recent research that the same algorithm cannot learn Nim: https://arstechnica.com/ai/2026/03/figuring-out-why-ais-get-.... Isn't that food for thought?

It's a long known problem with RL in a particular regime and isn't relevant to coding agents. Things like Nim are a small, adversarially structured task family and it's not representative of language / coding / real-world tasks. Nim is almost the worst possible case, the optimal optimal policy is a brittle, discontinuous function.

Alphago is pure RL from scratch, this is quite challenging, inefficient, and unstable, and why we dont do that with LLMs, we pretrain them first. RL is not used to discover invariants (aspects of the problem that don't change when surface details change) from scratch in coding agents as they are in this example. Pretraining takes care of that and RL is used for refinement, so a completely different scenario where RL is well suited.

  • I didn't make any claims contradicted by literature. The only thing I cited as bedrock fact, NFL, is a mathematical theorem. I'm not sure why Nim shouldn't be relevant, it's an exercise in logic.

    > “AlphaZero excels at learning through association,” Zhou and Riis argue, “but fails when a problem requires a form of symbolic reasoning that cannot be implicitly learned from the correlation between game states and outcomes.”

    Seems relevant.