Comment by aspenmartin
3 hours ago
You are not wrong about anything you’re saying but like I said this misses the forest for the trees. I’m talking about like the next ~2 years. There is a common idea that we don’t understand this technology or what will happen performance wise. We know a lot more about what’s going to happen than people think. It’s because none of this is new. We’ve known about neural nets since the 40s, we know how RL works on a fundamental level and it has been an active and beautiful field of research for at least 30-40 years, we know what happens when you combine RL with verifiable rewards and throw a lot of compute at it.
One big misconception is that these models are trained to mimic humans and are limited by the quality of the human training data, and this is not true and also basically almost entirely the reason why you have so much bullishness and premature adoption of agentic coding tools.
Coding agents use human traces as a starting point. You technically don’t have to do this at all but that’s an academic point, you can’t do it practically (today). The early training stages with human traces (and also verified synthetic traces from your last model) get you to a point where RL is stable and efficient and push you the rest of the way. It’s synthetic data that really powers this and it’s rejection sampling; you generate a bunch of traces, figure out which ones pass the verification, and keep those as training examples.
So because
- we know how this works on a fundamental level and have for some time
- human training data is a bootstrap it’s not a limitation fundamentally
- you are absolutely right about your observations yet look at where you are today and look at say Claude sonnet 3.x. It’s an entire world away in like a year
- we have imperfect benchmarks all with various weaknesses yet all of them telling the same compelling story. Plus you have adoption numbers and walled garden data that is the proof in the pudding
The onus is on people who say “this is plateauing” or “this has some fundamental limitation that we will not get past fairly quickly”.
> look at say Claude sonnet 3.x. It’s an entire world away in like a year
In the area I work I find them to be of very little value both then and now... I see no real difference. They help in marginal tasks. Eg. they catch typos, or they help new programmers to faster explore the existing codebase.
So far, I haven't used a single line of code generated by AI, even though I've seen thousands. Some of them worked to draw attention to a problem, but none solved it successfully. It was all pretty lame.
I see no reason to believe it's going to get better. Waving hands more forcefully isn't helping, there's no argument behind the promise of "it will get better". No reason to believe it will...
But, more importantly, the AI is applied on a level where really important things don't happen. It's automating boilerplate work. It doesn't make decisions about the important parts. Like, in the example above, the AI is not capable of choosing a better strategy: use pyproject.toml or write code to build Python packages? It's not the kind of decision it's called to make and nobody sensible would trust it to make such a decision because there isn't a clear right or wrong answer, only the future will prove one or the other to be the right call.