← Back to context

Comment by benrutter

17 hours ago

I use AI in my workflow mostly for simple boilerplate, or to troubleshoot issues/docs.

I've dipped into agentic work now and again, but never been very impressed with the output (well, that there is any functioning output is insanely impressive, but it isn't code I want to be on the hook for complaining).

I hear a lot of people saying the same, but similarly a bunch of people I respect saying they barely write code anymore. It feels a little tricky to square these up sometimes.

Anyway, really looking forward to trying some if these patterns as the book develops to see if that makes a difference. Understanding how other peopke really use these tools is a big gap for me.

One thing I rarely see mentioned is that often creating code by hand is simply faster (at least for me) than using AI. Creating a plan for AI, waiting for execution, verifying, prompting again etc. can take more time than just doing it on my own with a plan in my head (and maybe some notes). Creating something from scratch or doing advanced refactoring is almost always faster with AI, but most of my daily tasks are bugs or features that are 10% coding and 90% knowing how to do it.

  • > 10% coding and 90% knowing how to do it

    I think this is the main point where many people’s work differs. Most of my work I know roughly what needs changing and how things are structured but I jump between codebases often enough that I can’t always remember the exact classes/functions where changes are needed. But I can vaguely gesture at those specific changes that need to be made and have the AI find the places that need changing and then I can review the result.

    I rarely get the luxury of working in a single codebase for a long enough period of time to get so familiar with it that I can jump to particular functions without much thought. That means AI is usually a better starting point than me fumbling around trying to find what I think exists but I don’t know where it is.

  • I've heard people say that these coding agents are just tools and don't replace the thinking. That's fine but the problem for me is that the act of coding is when I do my thinking!

    I'm thinking about how to solve the problem and how to express it in the programming language such that it is easy to maintain. Getting someone/something else to do that doesn't help me.

    But different strokes for different folks, I suppose.

  • Yes, it's often faster if you sit around waiting. What I will do instead is prompt the AI to create various plans, do other stuff while they do, review and approve the plans, do other stuff while multiple plans are being implemented, and then review and revise the output.

    And I have the AI deal with "knowing how to do it" as well. Often it's slower to have it do enough research to know how to do it, but my time is more expensive than Claude's time, and so as long as I'm not sitting around waiting it's a net win.

    • I do this too, but then you need some method to handle it, because now you have to read and test and verify multiple work streams. It can become overwhelming. In the past week I had the following problems from parallel agents:

      Gemini running an benchmark- everything ran smoothly for an hour. But on verification it had hallucinated the model used for judging, invalidating the whole run.

      Another task used Opus and I manually specified the model to use. It still used the wrong model.

      This type of hallucination has happened to me at least 4-5 times in the past fortnight using opus 4.6 and gemini-3.1-pro. GLM-5 does not seem to hallucinate so much.

      So if you are not actively monitoring your agent and making the corrections, you need something else that is.

      5 replies →

    • This sounds like one recipe for burnout, much like Aderal was making everyone code faster until their brain couldn’t keep up with its own backlog.

    • >And I have the AI deal with "knowing how to do it" as well. Often it's slower to have it do enough research to know how to do it

      This is exactly the sort of future I'm afraid of. Where the people who are ostensibly hired to know how stuff works, out source that understanding to their LLMs. If you don't know how the system works while building, what are you going to when it breaks? Continue to throw your LLM at it? At what point do you just outsource your entire brain?

  • For me it _can_ be faster to code than to instruct but it takes me significantly less effort to write the prompt than the actual code. So a few hours of concentrates coding leave me completely drained of energy while after a few hours with the agents I still have a lot of mental energy. That's the huge difference for me and I don't want to go back.

    • Thats interesting. While i do get mentally tired after a session of focused coding, i feel like i have accomplished something. Using AI for coding feels similar to spending hours doom scrolling reels. Less engaging but Im drained as hell at the end.

      2 replies →

  • My way of phrasing this: I need to activate my personal transformers on my inner embeddings space to really figure what is it that I truly want to write.

  • I delegate to agents what I hate doing, e.g. when creating a SaaS web app, the last thing I want to waste my time on is the landing page with about/pricing/login and Stripe integration frontend/backend - I'll just tell Claude Code (with Qwen3-Coder-Next-Q8 running locally on RTX Pro 6000) to make all this basic stuff for me so that I can focus on the actual core of the app. It then churns for half an hour, spews out the first version where I need to spend another half an hour to fix bugs by pointing errors to Claude Code and then in 1 hour it's all done. I can also tell it to avoid all the node.js garbage and do it all in plain HTML/JS/CSS.

  • The rebuttal to this would be that you can do many such tasks in parallel.

    I’m not sure it’s really true in practice yet, but that would certainly be the claim.

    • But can you mentally "keep hold" (for lack of a better term) of those tasks that are getting executed in parallel? Honestly asking.

      Because, after they're done/have finished executing, I guess you still have to "check" their output, integrate their results into the bigger project they're (supposedly) part of etc, and for me the context-switching required to do all that is mentally taxing. But maybe this only happens because my brain is not young enough, that's why I'm asking.

      3 replies →

When was the last time you tried?

I think trying agents to do larger tasks was always very hit or miss, up to about the end of last year.

In the past couple of months I have found them to have gotten a lot better (and I'm not the only one).

My experience with what coding assistants are good for shifted from:

smart autocomplete -> targeted changes/additions -> full engineering

  • I’m not OP but every time I post a comment with this sentiment I get told “the latest models are what you need”. If every 3 months you are saying “it’s ready as long as you use the latest model”, then it wasn’t ready 3 months ago and it’s not likely to be ready now.

    To answer your question, I’ve tried both Claude code and Antigravity in the last 2 weeks and I’m still finding them struggling. AG with Gemini regularly gets stuck on simple issues and loops until I run out of requests, and Claude still just regularly goes on wild tangents not actually solving the problem.

    • I don’t think that’s true. Claude Opus 4.5/4.6 in Cursor have marked the big shift for me. Before that, agentic development mostly made me want to just do it myself, because it was getting stuck or going on tangents.

      I think it can (and is) shifting very rapidly. Everyone is different, and I’m sure models are better at different types of work (or styles of working), but it doesn’t take much to make it too frustrating to use. Which also means it doesn’t take much to make it super useful.

      2 replies →

    • It depends on what you're handling. Frontend (not css), swagger, mundane CRUD is where it shines. Something more complex that need a bit harder calculation usually make the agents struggling.

      Especially good to navigate the code if you're unfamiliar with it (the code). If you have known the code for good, you'll find it's usually faster to debug and code by yourself.

      Opus 4.6 with claude code vscode extension

    • Have you tried it with something like OpenSpec? Strangely, taking the time to lay out the steps in a large task helps immensely. It's the difference between the behavior you describe and just letting it run productively for segments of ten or fifteen minutes.

      1 reply →

    • I thought this too and then I discovered plan mode. If you just prompt agent mode it will be terrible, but coming up with a plan first has really made a big difference and I rarely write code at all now

      1 reply →

    • Agree, it’s strange, I will just assume that the people who say this are building react apps. I still have so much ”certainly, I should not do this in a completely insane way, let me fix that” … -400+2. It’s not always, and it is better than it was, but that’s it.

      1 reply →

    • At this point though, after Claude C Compiler, you've got to give us more details to better understand the dichotomy. What do you consider simple issues?

      2 replies →

  • > When was the last time you tried?

    Pretty recently (a couple weeks ago). I give agentic workflows a go every couple of weeks or so.

    I should say, I don't find them abysmal, but I tend to work in codebases where I understand them, and the patterns really well. The use cases I've tried so far, do sort of work, just not yet at least, faster than I'm able to actual write the code myself.

  • > My experience with what coding assistants are good for shifted from:

    > smart autocomplete -> targeted changes/additions -> full engineering

    Define "full engineering". Because if you say "full engineering" I would expect the agent to get some expected product output details as input and produce all by itself the right implementation for the context (i.e. company) it lives in.

    • I agree that "full engineering" was a bit broad. I should probably have said something like "agent-only coding"?

      I.e. the point where the agent writes all the code and you just verify.

      1 reply →

> I've dipped into agentic work now and again, but never been very impressed with the output (well, that there is any functioning output is insanely impressive, but it isn't code I want to be on the hook for complaining).

> I hear a lot of people saying the same, but similarly a bunch of people I respect saying they barely write code anymore. It feels a little tricky to square these up sometimes.

It squares up just fine.

You ever read a blog post or comment and think "Yeah, this is definitely AI generated"? If you can recognise it, would you accept a blog post, reviewed by you, for your own blog/site?

I won't; I'll think "eww" and rewrite.

The developers with good AI experiences don't get the same "eww" feeling when reading AI-generated code. The developers with poor AI experiences get that "eww" feeling all the time when reviewing AI code and decide not to accept the code.

Well, that's my theory anyway.

  • I also will rewrite both text and code created by Gen AI. I've found the best workflow for me is not to refine what I've written, but instead to use it to help me get over humps and/or crank through some of the drudgery. And then I go back and edit, fixing any issues I spot and to reshape it to be in my own voice.

    I do this with code too.

> It feels a little tricky to square these up sometimes.

In my experience, this heavily depends on the task, and there's a massive chasm between tasks where it's a good and bad fit. I can definitely imagine people working only on one side of this chasm and being perplexed by the other side.

My experience is that the first iteration output from a single agent is not what I want to be on the hook for. What squares it for me with "not writing code anymore" is the iterative process to improve outputs:

1) Having review loops between agents (spawn separate "reviewer" agents) and clear tests / eval criteria improved results quite a bit for me. 2) Reviewing manually and giving instructions for improvements is necessary to have code I can own

  • Is that… actually faster than just doing it yourself, tho? Like, “I could write the right thing, or I could have this robot write the wrong thing and then nag it til it corrects itself” seems to suggest a fairly obvious choice.

    I’ve yet to see these things do well on anything but trivial boilerplate.

    • In my experience, sometimes. Not that often, depends on the task.

      The benefit is I can keep some things ticking over while I’m in meetings, to be honest.

    • Think of it like installing Linux. The first time it's absolutely not worth it from a time perspective. But after you've installed it once, you can reuse that installation, and eventually it makes sense and becomes second nature. Eventually that time investment pays dividends. Just like Linux tho, no one's going to force to you to install it and you'll probably go on to have a fine career without ever having touched the stuff.

I was in the same boat as you until I saw DHH post about how he’s changed his use of agents. In his talk with Lex Fridman his approach was similar to mine and it really felt like a kernel of sanity amongst the hype. So when he said he’s changed his approach I had another look. I’m using agents (Claude code) every day now. I still write code every day too. (So does Dax Raad from OpenCode to throw a bit more weight behind this stance). I’m not convinced the models can own a production code base and that therefore engineers need to maintain their skills sufficiently to be responsible. I find agents helpful for a lot of stuff, usually heavily patterned code with a lot of prior art. I find CC consistently sucks at writing polars code. I honestly don’t enjoy using agents at all and I don’t think anyone can honestly claim they know how this is going to shake out. But I feel by using the tools myself I have a much stronger sense of reality amongst the hype.

  • I strongly agree with that last statement—I hate using agents because their code smells awful even if it works. But I have to use them now because otherwise I’m going to wake up one day and be 100% obsolete and never even notice how it happened.

I still write code but do not push everything off to the agent. Try my best to write small tasks. ~20% of the time I have to get in there. If someone says they're absolutely not writing a line of code they must have amazing guardrails.

>It feels a little tricky to square these up sometimes.

I don’t think you have to square them because those sentiments are coming from different people. They are also coming from people at different points along the adoption curve. If you are struggling and you see other people struggling at the beginning of the adoption curve it can be quite difficult difficult to understand someone who is further along and does not appear to be struggling.

I think a lot of folks who have struggled with these tools do so because both critics and boosters create unrealistic expectations.

What I recommend is you keep trying. This is a new skill set. It is a different skill set. Which other skills that existed in the past remain necessary is not known.