Comment by topherhunt

5 days ago

@hollowturtle I'm surprised - do you really find that sota models aren't good enough to generate production code with steering and babysitting? My experience (Claude Code, mostly Opus 4.6) is that it's fantastic at this. At least in JS + TS + Elixir + Ruby. It does indeed need babysitting, my mental model is that it's an exoskeleton not a junior dev, but IME it's a friggin badass exoskeleton, easily 10x-ing my speed on most work. Notably I do NOT --dangerously-skip-permissions nor use claude code's auto mode, I micromanage and lightly review every line it's writing as it writes it, so I rarely have more than 2 sessions generating simultaneously. I suspect that a lot of the disappointment comes in when people try to delegate to it and trust it to not go off the rails. It hasn't earned that trust from me yet (and hasn't needed to yet).

Granted, I'm mostly working in small-to-medium codebases, 20k-30k LOC incl test suite. I wonder if that's a factor in my positive experience. Curious to hear your thoughts.

It really depends on the task, but, in my experience, small to medium and bigger codebases, the amount of steering to get quality code is not worth it.

I see patterns and solutions emerging from hand coding, I'm not the other way around, I can't start with a prompt, unless again I have the feeling that the task can be one-shot with minimumn effort and context.

Starting with a prompt, or in plan mode, it's not how I trained as an engineer, I cannot foresee what something should be/look like until I explore it myself with code I can relate to, that I'm connected with and that I fully understand, for example my muscle memory suggest me to use a specific data structure only after I see some code patterns emerging, hard to explain hopefully makes sense.

If I ask the agent to do that initial exploring, even with a tremendous amount of instructions, guidelines etc. it usually start with a path I wouldn't have started with. What I tried in such cases is to stop it, correct it and generate again, only to end up with more prompt words than lines of code. This is true for every visual task I'm working on (I program non web UIs). Let alone doing it via spec files, if it's something I don't care about yeah sure, maybe a little tool for entering/editing data, but alas it always default to slop web apps, and I get it I mean most of the training set is on web apps

  • > quality code

    Probably where the mismatch is in this discussion. The measure of what is quality code is all over the place. For some, some form of "good enough" is quality. And for others, metrics like terseness, readability, vacuous amounts of comments, cleverness, various fuzzy measures of "idiomatic", etc, make "quality code" much more of a moving target.

  • I think this depends a lot on the task, the existing codebase, and the taste of the operator.

    In general I tend to agree with you if you're talking a codebase you are deeply familiar with, the value-add from have agents write the code probably ranges from very small to negative in most cases.

    On the other hand if you're trying to make changes in systems you are not familiar with, LLMs are a huge speed boost to folks with enough experience to sniff out what would be a bad path essentially via socratic method to the agent.

    Obviously there are no silver bullets and no substitute for judgment. I will say though, I'll tradeoff ugly local code for good data models and interfaces any day of the week, and there is definitely an archetype of engineer that is very precious about code without good judgment on where it matters and where it doesn't.