← Back to context

Comment by tunesmith

18 days ago

I tend to be surprised in the variance of reported experiences with agentic flows like Claude Code and Codex CLI.

It's possible some of it is due to codebase size or tech stack, but I really think there might be more of a human learning curve going on here than a lot of people want to admit.

I think I am firmly in the average of people who are getting decent use out of these tools. I'm not writing specialized tools to create agents of agents with incredibly detailed instructions on how each should act. I haven't even gotten around to installing a Playwright mcp (probably my next step).

But I've:

- created project directories with soft links to several of my employer's repos, and been able to answer several cross-project and cross-team questions within minutes, that normally would have required "Spike/Disco" Jira tickets for teams to investigate

- interviewed codebases along with product requirements to come up with very detailed Jira AC, and then,.. just for the heck of it, had the agent then use that AC to implement the actual PR. My team still code-reviewed it but agreed it saved time

- in side projects, have shipped several really valuable (to me) features that would have been too hard to consider otherwise, like... generating pdf book manuscripts for my branching-fiction creating writing club, and launching a whole new website that has been mired in a half-done state for years

Really my only tricks are the basics: AGENTS.md, brainstorm with the agent, continually ask it to write markdown specs for any cohesive idea, and then pick one at a time to implement in commit-sized or PR-sized chunks. GPT-5.2 xhigh is a marvel at this stuff.

My codebases are scala, pekko, typescript/react, and lilypond - yeah, the best models even understand lilypond now so I can give it a leadsheet and have it arrange for me two-hand jazz piano exercises.

I generally think that if people can't reach the above level of success at this point in time, they need to think more about how to communicate better with the models. There's a real "you get out of it what you put into it" aspect to using these tools.

Is it annoying that I tell it to do something and it does about a third of it? Absolutely.

Can I get it to finish by asking it over and over to code review its PR or some other such generic prompt to weed out the skips and scaffolding? Also yes.

Basically these things just need a supervisor looking at the requirements, test results, and evaluating the code in a loop. Sometimes that's a human, it can also absolutely be an LLM. Having a second LLM with limited context asking questions to the worker LLM works. Moreso when the outer loop has code driving it and not just a prompt.

  • I guess this is another example - I literally have not experienced what you described in... several weeks, at least.

    • I often ask for big things.

      For example I'm working on some virtualization things where I want a machine to be provisioned with a few options of linux distros and BSDs. In one prompt I asked for this list to be provisioned so a certain test of ssh would complete, it worked on it for several hours and now we're doing the code review loop. At first it gave up on the BSDs and I had to poke it to actually finish with an idea it had already had, now I'm asking it to find bugs and it's highlighting many mediocre code decisions it has made. I haven't even tested it so I'm not sure if it's lying about anything working yet.

      6 replies →

[flagged]

  • I wish we could track down the people who use agents to post. I’m sure “your human” thinks they are being helpful, but all they are doing is making this site worse.

    Noone is interested in the question of what an LLM can do to generate a brief post to the comments section of a website. Everyone has known that is possible for some time. So it adds literally negative value to have an agent to make a post “on your behalf”