← Back to context

Comment by dist-epoch

12 hours ago

You are assuming a linear future while we are in an exponential.

One year ago models could barely write a working function.

GPT-4o is 23 months old.

One year ago, the models were only slightly less competent than today. There were models writing entire apps 3 years ago. Competent function writing is basically a given on all models since GPT3.

Much of the progress in the past year has been around the harnesses, MCPs, and skills. The models themselves are not getting better exponentially, if anything the progress is slowing down significantly since the 2023-2024 releases.

  • >One year ago, the models were only slightly less competent than today.

    That has not been my experience. This weekend I pointed Claude Code+Opus 4.6+effort=max at a PRD describing a Docusign-like software. The exact same document I gave to Claude Code+Opus 4.5+Ultrathink around 6 months ago.

    The touch-ups I needed after it completed implementation was around a tenth that it took with 4.5. It is a pretty startling difference.

    • Agree with this. Opus 4.6 thinks of things I didn't even put in the spec, but absolutely need. It thinks around all the edge cases and gotchas. And I love the way modern AI UIs stop in their tracks and have you answer a bunch of questions about all the ambiguities you left in the spec.

      They still do dumb shit from time-to-time, but it's getting rarer.

  • Yeah I've been able to get great Python functions out of everything since the ChatGPT 4 API in early-to-mid 2023.

    It takes far less manual prompting to make it have consistent output, work well with other languages, etc. But if you watch the "thinking" logs it looks an awful lot like the "prompt engineering" you'd do by hand back then. And the output for tricky cases still sometimes goes sideways in obviously-naive-ways. The most telling thing in my experience is all the grepping, looping, refining - it's not "I loaded all twenty of these files into context and have such a perfect understanding of every line's place in the big picture that I can suggest a perfect-the-first-time maximally-elegant modification." It's targeted and tactical. Getting really good at its tactics for that stuff, though!

    I can get more done now than a year ago because taking me out of the annoying part of that loop is very helpful.

    But there's still a very curious gap that the tool that can quickly and easily recognize certain type of bugs if you ask them directly will also happily spit out those sorts of bugs while writing the code. "Making up fake functions" doesn't make it to the user much anymore, but "not going to be robust in production but technically satisfies the prompt" still does, despite it "knowing better" when you ask it about the code five seconds later.

One year before 1969 we had never been to the moon. In the 70s credible scientists and physicists predicted that large martian colonies would exist before the year 2000.

If a metric goes from 0 to 2 it doesn't mean it's on a long-lived exponential trajectory.

> One year ago models could barely write a working function.

This is a false claim.

Claude Code was released over a year ago.

Models have improved a lot recently, but if you think 12 months ago they could barely write a working function you are mistaken.

This comment is getting punished for the incorrect timeline (I would know, I've been harping on about AI getting good at coding for ~2 years now!) but I do think it is directionally correct. Just over 3 years ago, (publicly available) AI could not write code at all. Today it can write whole modules and project scaffoldings and even entire apps, not to mention all the other stuff agents can do today. Considering I didn't think I'd see this kind of stuff in my lifetime, this is a blink of an eye.

Even if a lot of the improvements we see today are due to things outside the models themselves -- tools, harnesses, agents, skills, availability of compute, better understanding of how to use AI, etc. -- things are changing very quickly overall. It would be a mistake to just focus on one or two things, like models or benchmarks, and ignore everything else that is changing in the ecosystem.

  • I agree it's directionally correct, but only in the ways that don't matter to this discussion. If 2026->2029 AI is as much of an improvement as 2023->2026 AI, is anything we learn about how to leverage it in 2026 going to stay relevant?

Sigmoids look a lot like exponentials early on.

We can’t say for sure yet which trajectory we are on.

Seems extremely disingenuous to say that one year ago models could barely write a working function. In fact, there were plenty capable of writing a working function with the right context fed in, exactly as today.