← Back to context

Comment by simonw

5 days ago

If it does the wrong thing you tell it what the right thing is and have it try again.

With the latest models if you're clear enough with your requirements you'll usually find it does the right thing on the first try.

There are several rubs with that operating protocol extending beyond the "you're holding it wrong" claim.

1) There exists a threshold, only identifiable in retrospect, past which it would have been faster to locate or write the code yourself than to navigate the LLM's correction loop or otherwise ensure one-shot success.

2) The intuition and motivations of LLMs derive from a latent space that the LLM cannot actually access. I cannot get a reliable answer on why the LLM chose the approaches it did; it can only retroactively confabulate. Unlike human developers who can recall off-hand, or at least review associated tickets and meeting notes to jog their memory. The LLM prompter always documenting sufficiently to bridge this LLM provenance gap hits rub #1.

3) Gradually building prompt dependency where one's ability to take over from the LLM declines and one can no longer answer questions or develop at the same velocity themselves.

4) My development costs increasingly being determined by the AI labs and hardware vendors they partner with. Particularly when the former will need to increase prices dramatically over the coming years to break even with even 2025 economics.

  • I've said this multiple times:

    This is why you use this AI bubble (it IS a bubble) to use the VC-funded AI models for dirt cheap prices and CREATE tools for yourself.

    Need a very specific linter? AI can do it. Need a complex Roslyn analyser? AI. Any kind of scripting or automation that you run on your own machine. AI.

    None of that will go away or suddenly stop working when the bubble bursts.

    Within just the last 6 months I've built so many little utilities to speed up my work (and personal life) it's completely bonkers. Most went from "hmm, might be cool to..." to a good-enough script/program in an evening while doing chores.

    Even better, start getting the feel for local models. Current gen home hardware is getting good enough and the local models smart enough so you can, with the correct tooling, use them for suprisingly many things.

    • > Even better, start getting the feel for local models. Current gen home hardware is getting good enough and the local models smart enough so you can, with the correct tooling, use them for suprisingly many things.

      Are there any local models that are at least somewhat comparable to the latest-and-greatest (e.g. Opus 4.5, Gemini 3), especially in terms of coding?

    • A risk I see with this approach is that when the bubble pops, you'll be left dependent on a bunch of tools which you don't know how to maintain or replace on your own, and won't have/be able to afford access to LLMs to do it for you.

      3 replies →

  • > 1) There exists a threshold, only identifiable in retrospect, past which it would have been faster to locate or write the code yourself than to navigate the LLM's correction loop or otherwise ensure one-shot success.

    I can run multiple agents at once, across multiple code bases (or the same codebase but multiple different branches), doing the same or different things. You absolutely can't keep up with that. Maybe the one singular task you were working on, sure, but the fact that I can work on multiple different things without the same cognitive load will blow you out of the water.

    > 2) The intuition and motivations of LLMs derive from a latent space that the LLM cannot actually access. I cannot get a reliable answer on why the LLM chose the approaches it did; it can only retroactively confabulate. Unlike human developers who can recall off-hand, or at least review associated tickets and meeting notes to jog their memory. The LLM prompter always documenting sufficiently to bridge this LLM provenance gap hits rub #1.

    Tell the LLM to document in comments why it did things. Human developers often leave and then people with no knowledge of their codebase or their "whys" are even around to give details. Devs are notoriously terrible about documentation.

    > 3) Gradually building prompt dependency where one's ability to take over from the LLM declines and one can no longer answer questions or develop at the same velocity themselves.

    You can't develop at the same velocity, so drop that assumption now. There's all kinds of lower abstractions that you build on top of that you probably can't explain currently.

    > 4) My development costs increasingly being determined by the AI labs and hardware vendors they partner with. Particularly when the former will need to increase prices dramatically over the coming years to break even with even 2025 economics.

    You aren't keeping up with the actual economics. This shit is technically profitable, the unprofitable part is the ongoing battle between LLM providers to have the best model. They know software in the past has often been winner takes all so they're all trying to win.

> With the latest models if you're clear enough with your requirements you'll usually find it does the right thing on the first try

That's great that this is your experience, but it's not a lot of people's. There are projects where it's just not going to know what to do.

I'm working in a web framework that is a Frankenstein-ing of Laravel and October CMS. It's so easy for the agent to get confused because, even when I tell it this is a different framework, it sees things that look like Laravel or October CMS and suggests solutions that are only for those frameworks. So there's constant made up methods and getting stuck in loops.

The documentation is terrible, you just have to read the code. Which, despite what people say, Cursor is terrible at, because embeddings are not a real way to read a codebase.

  • I'm working mostly in a web framework that's used by me and almost nobody else (the weird little ASGI wrapper buried in Datasette) and I find the coding agents pick it up pretty fast.

    One trick I use that might work for you as well:

      Clone GitHub.com/simonw/datasette to /tmp
      then look at /tmp/docs/datasette for
      documentation and search the code
      if you need to
    

    Try that with your own custom framework and it might unblock things.

    If your framework is missing documentation tell Claude Code to write itself some documentation based on what it learns from reading the code!

    • > I'm working mostly in a web framework that's used by me and almost nobody else (the weird little ASGI wrapper buried in Datasette) and I find the coding agents pick it up pretty fast

      Potentially because there is no baggage with similar frameworks. I'm sure it would have an easier time with this if it was not spun off from other frameworks.

      > If your framework is missing documentation tell Claude Code to write itself some documentation based on what it learns from reading the code!

      If Claude cannot read the code well enough to begin with, and needs supplemental documentation, I certainly don't want it generating the docs from the code. That's just compounding hallucinations on top of each other.

      1 reply →

In a circuitous way, you can rather successfully have one agent write a specification and another one execute the code changes. Claude code has a planning mode that lets you work with the model to create a robust specification that can then be executed, asking the sort of leading questions for which it already seems to know it could make an incorrect assumption. I say 'agent' but I'm really just talking about separate model contexts, nothing fancy.

  • Cursor's planning functionality is very similar and I have found that I can even use "cheap" models like their Composer-1 and get great results in the planning phase, and then turn on Sonnet or Opus to actually produce the plan. 90% of the stuff I need to argue about is during the planning phase, so I save a ton of tokens and rework just making a really good spec.

    It turns out that Waterfall was always the correct method, it's just really slow ;)

    • Did you know that software specifications used to be almost entirely flow charts? There is something to be said for that and waterfall.

Even better, have it write code to describe the right thing then run its code against that, taking yourself out of that loop.

And if you've told it too many times to fix it, tell it someone has a gun to your head, for some reason it almost always gets it right this very next time.

  • If you're a developer at the dawn of the AI revolution, there is absolutely a gun to your head.

    • Yeah, if anyone can truly afford the AI empire. Remember all these "leading" companies are running it at a loss, so most companies paying for it are severely underpaying the cost of it all. We would need an insane technological breakthrough of unlimited memory and power before I start to worry, and at that point, I'll just look for a new career.