← Back to context

Comment by bluGill

2 days ago

I figure it takes me a week to turn the output of ai into acceptable code. Sure there is a lot of code in 30 seconds but it shouldn't pass code review (even the ai's own review).

For now. Claude is worse than we are at programming. But its improving much faster than I am. Opus 4.6 is incredible compared to previous models.

How long before those lines cross? Intuitively it feels like we have about 2-3 years before claude is better at writing code than most - or all - humans.

  • It is certainly already better than most humans, even better than most humans who occasionally code. The bar is already quite high, I'd say. You have to be decent in your niche to outcompete frontier LLM Agents in a meaningful way.

    • I'm only allowed 4.5 at work where I do this (likely to change soon but bureaucracy...). Still the resulting code is not at a level I expect.

      i told my boss (not fully serious) we should ban anyone with less than 5 years experience from using the ai so they learn to write and recognize good code.

    • The key difference here is that humans can progress. They can learn reasoning skills, and can develop novel methods.

      The LLM is a stochastic parrot. It will never be anything else unless we develop entirely new theories.

      1 reply →

  • I keep seeing this. The "for now" comments, and how much better it's getting with each model.

    I don't see it in practice though.

    The fundamental problem hasn't changed: these things are not reasoning. They aren't problem solving.

    They're pattern matching. That gives the illusion of usefulness for coding when your problem is very similar to others, but falls apart as soon as you need any sort of depth or novelty.

    I haven't seen any research or theories on how to address this fundamental limitation.

    The pattern matching thing turns out to be very useful for many classes of problems, such as translating speech to a structured JSON format, or OCR, etc... but isn't particularly useful for reasoning problems like math or coding (non-trivial problems, of course).

    I'm pretty excited about the applications for AI overall and it's potential to reduce human drudgery across many fields, I just think generating code in response to prompts is a poor choice of a LLM application.

    • > I don't see it in practice though.

      Have you actually tried the latest agentic coding models?

      Yesterday I asked claude to implement a working web based email client from scratch in rust which can interact with a JMAP based mail server. It did. It took about 20 minutes. The first version had a few bugs - like it was polling for mail instead of streaming emails in. But after prompting it to fix some obvious bugs, I now have a working email client.

      Its missing lots of important features - like, it doesn't render HTML emails correctly. And the UI looks incredibly basic. But it wrote the whole thing in 2.5k lines of rust from scratch and it works.

      This wasn't possible at all a couple of years ago. A couple of years ago I couldn't get chatgpt to port a single source file from rust to typescript without it running out of context space and introducing subtle bugs in my code. And it was rubbish at rust - it would introduce borrow checker problems and then get stuck, trying and failing to get it to compile. Now claude can write a whole web based email client in rust from scratch, no worries. I did need to manually point out some bugs in the program - claude didn't test its email client on its own. There's room for improvement for sure. But the progress is shocking.

      I don't know how anyone who's actually pushed these models can claim they haven't improved much. They're lightyears ahead of where they were a few years ago. Have you actually tried them?

      1 reply →