Comment by ok_dad

10 hours ago

Why are you letting the LLM drive? Don't turn on auto-approve, approve every command the agent runs. Don't let it make design or architecture decisions, you choose how it is built and you TELL that clanker what's what! No joke, if you treat the AI like a tool then you'll get more mileage out of it. You won't get 10x gains, but you will still understand the code.

Personally I've found "carefully review every move it makes" to be an extremely unpleasant and difficult workflow. The effort needed to parse every action is immense, but there's a complete absence of creative engagement - no chance of flow state. Just the worst kind of work which I've been unable to sustain, unfortunately. At this point I mostly still do work by hand.

  • It's unpleasant for me at normal speed settings, but on fast mode it works really well: the AI does changes quickly enough for me to stay focused.

    Of course this requires being fortunate enough that you have one of those AI positive employers where you can spend lots of money on clankers.

    I don't review every move it makes, I rather have a workflow where I first ask it questions about the code, and it looks around and explores various design choices. then i nudge it towards the design choice I think is best, etc. That asking around about the code also loads up the context in the appropriate manner so that the AI knows how to do the change well.

    It's a me in the loop workflow but that prevents a lot of bugs, makes me aware of the design choices, and thanks to fast mode, it is more pleasant and much faster than me manually doing it.

  • This is my biggest problem with the promises of agentic coding (well, there are an awful lot of problems, but this is the biggest one from an immediate practical perspective).

    One the one hand, reviewing and micromaning everything it does is tedious and unrewarding. Unlike reviewing a colleague's code, you're never going to teach it anything; maybe you'll get some skills out of it if you finds something that comes up often enough it's worth writing a skill for. And this only gets you, at best, a slight speedup over writing it yourself, as you have to stay engaged and think about everything that's going on.

    Or you can just let it grind away agentically and only test the final output. This allows you to get those huge gains at first, but it can easily just start accumulating more and more cruft and bad design decisions and hacks on top of hacks. And you increasingly don't know what it's doing or why, you're losing the skill of even being able to because you're not exercising it.

    You're just building yourself a huge pile of technical debt. You might delete your prod database without realizing it. You might end up with an auth system that doesn't actually check the auth and so someone can just set a username of an admin in a cookie to log in. Or whatever; you have no idea, and even if the model gets it right 95% of the time, do you want to be periodically rolling a d20 and if you get a 1 you lose everything?

  • I agree, but I also think that giving the LLM free rein is also extremely unpleasant and difficult. And you still need to review the resulting code.

    • I don't think there's anything difficult or unpleasant about the process of letting the LLM run free, that's the whole point, it's nearly frictionless. Which includes not reviewing the code carefully. You say "need" but you mean "ought".

      1 reply →

  • Reviewing isn't hard when the diff is what you asked for. It's when you asked for a one-line fix and get back 40 changed lines across four files. At that point you're not even reviewing your change anymore, you're auditing theirs.

That's the trap though. The moment you approve every step, you're no longer getting the product that was sold to you. You're doing code review on a stochastic intern. The whole 10x story depends on you eventually looking away.

  • Just don’t buy the tools for 10x improvements, buy them for the 1.1x improvement and the help it gives with the annoying stuff like refactoring arguments to a function that’s used all over, writing tests, etc. They can also help reduce cognitive load in certain ways when you just use them to ask about your large code base.

Because the degree to which the LLM prompts you back to the terminal is too frequent for the human to engage in parallel work.

  • I’m basically saying don’t do parallel work, use it as a tool. Just sit there and watch it do stuff, make sure it’s doing what you want, and stop it if it’s doing too much or not what you want to do.

    Maybe I’m just weird (actually that’s a given) but I don’t mind babysitting the clanker while it works.

I define tools that perform individual tasks, like build the application, run the tests, access project management tools with task context, web search, edit files in the workspace, read only vs write access source control, etc.

The agent only has access to exactly what it needs, be it an implementation agent, analysis agent, or review agent.

Makes it very easy to stay in command without having to sit and approve tons of random things the agent wants to do.

I do not allow bash or any kind of shell. I don't want to have to figure out what some random python script it's made up is supposed to do all the time.

  • This is a cool idea, can you write more about how your tools work or maybe short descriptions of a few of them? I’m interested in more rails for my bots.

    • I just made MCP servers that wrap the tools I need the agents to use, and give no-ask permissions to the specific tools the agents need in the agent definition.

      Both OpenCode and VsCode support this. I think in ClaudeCode you can do it with skills now.

      The other benefit is the MCP tool can mediate e.g. noisy build tool output, and reduce token usage by only showing errors or test failures, nothing else, or simply an ok response with the build run or test count.

      So far, I have not needed to give them access to more than build tools, git, and a project/knowledge system (e.g. Obsidian) for the work I have them doing. Well and file read/write and web search.

      1 reply →

Because its SO much faster not to have to do all that. I think 10x is no joke, and if you're doing MVP, its just not worth the mental effort.

  • POC, sure (although 10x-ing a POC doesn't actually get you 10x velocity). MVP, though? No way. Today's frontier models are nowhere near smart enough to write a non-trivial product (i.e. something that others are meant to use), minimal or otherwise, without careful supervision. Anthropic weren't able to get agents to write even a usable C compiler (not a huge deal to begin with), even with a practically infeasible amount of preparatory work (write a full spec and a reference implementation, train the model on them as well as on relevant textbooks, write thousands of tests). The agents just make too many critical architectural mistakes that pretty much guarantee you won't be able to evolve the product for long, with or without their help. The software they write has an evolution horizon between zero days and about a year, after which the codebase is effectively bricked.

    • There is a million things in between a C compiler and a non-trivial product. They do make a ton of horrible architectural decisions, but I only need to review the output/ask questions to guide that, not review every diff.

      2 replies →

For that kind of flow, I prefer to work without AI.

  • The agent mostly helps me reduce cognitive load and avoid the fiddly bits. I still review and understand all of the code but I don’t have to think about writing all of it. I also still hand write tons of code when I want to be very specific about behavior.

I agree with this too. I decided on constraints for myself around these tools and I give my complete focus & attention to every prompt, often stopping for minutes to figure things through and make decisions myself. Reviewing every line they produce. I'm a senior dev with a lot of experience with pair programming and code review, and I treat its output just as I would those tasks.

It has about doubled my development pace. An absolutely incredible gain in a vacuum, though tiny compared to what people seem to manage without these self-constraints. But in exchange, my understanding of the code is as comprehensive as if I had paired on it, or merged a direct report's branch into a project I was responsible for. A reasonable enough tradeoff, for me.

This is significantly slower than just writing the code yourself.

  • I don’t find it slower overall, personally, but YMMV depending on how you like to tackle problems. Also the problem space and the project details can dictate that these tools aren’t helpful. Luckily the code I write tends to be perfect for a coding agent to clank away for me.

I have never found any utility in that. After all, you can still just review the diffs and ask it for explanation for sections instead.

  • > After all, you can still just review the diffs

    anonu has explicitly said that they've wiped a database twice as a result of agents doing stuff. What sort of diff would help against an agent running commands, without your approval?

    • Agent does not have to run in your user context. It is easy mistake to make in yolo mode but after that it's easy to fix. e.g. this is what I use now so I can release agent from my machine and also constrain its access:

          $ main-app git:(main) kubectl get pods | grep agent | head -n 1 | sed -E 's/[a-z]+-agent(.*)/app-agent\1/'
          app-agent-656c6ff85d-p86t8                          1/1     Running     0             13d
      

      Agent is fully capable of making PR etc. if you provide appropriate tooling. It wipes DB but DB is just separate ephemeral pod. One day perhaps it will find 0-day and break out, but so far it has not done it.

    • Hah I run my agent inside a docker with just the code. Anything clever it tries to do just goes nowhere.

  • > After all, you can still just review the diffs

    The diff: +8000 -4000

    • You can ask it to make the changes in appropriate PRs. SOTA model + harness can do it. I find it useful to separate refactors and implementations, just like with humans, but I admittedly rely heavily on multi-provider review.

It’s terribly slow

  • I get it, but if tomorrow every inference provider doubled costs I still understand my applications code and can continue to work on it myself.

    • I hear this a lot but I don't think decades of experience atrophies irretrievably so quickly as to make it worth it (alone) to abstain from making full use of these tools. I still read and direct enough of the architecture to not be lost in the code it generates. Maybe you haven't tried using agents to reorganize/refactor as much - I have cleaner code than I did before when it was done by hand, because I can afford to tackle debts.

      I also don't find the permissions it prompts for very meaningful. Permission to use a file search tool? Permission to make a web request? It's a clumsy way to slow it down enough for me to catch up.

  • You can push thousands of LOC every day while approving manually. If you went any faster you would not be able to read the code.