I dream of roombas: 1000s of automated AI robots that autonomously maintain code

13 hours ago (ghuntley.com)

6 comments

ghuntley

The proof is in the pudding. Having a thousand automated AI robots to maintain your code may be no more useful than a thousand chimps and typewriters to help you write Shakespeare.

My experience "vibe coding" doesn't give me much hope either. ChatGPT and Claude will happily suggest 500-line disaster commits that break more things than they fix, no matter how smart they get. We should all know by now that we can't trust ChatGPT to review code, even if it's got a penchant for the occasional college try.

> in Factorio terms - we need quality modules.

In Mario terms, we need a Super Mushroom. I think this is a weak cop-out for the many failings of AI, and the more important question of how we actually make AI good at all this. There are currently dozens of AI inference products you can buy for development, and none of them are your silver bullet. You can optimize agents all the way down your conveyor belts, but none of it will replace the discretionary management of a human.

We ought to be careful advocating for these things with certainty. I worry that the many promises of AI will become the 21st century "flying car" in the same way that early 1950s experiences with commercial flight created unrealistic expectations of scale and problem-solving.

ghuntley 12 hours ago
> ChatGPT and Claude
Out of interest, have you built an agent? Experienced the power of 500 lines of code run in a loop with uncapped tool calls? It's a different experience and outcome to copy and paste'ing from a chat interface.
https://ampcode.com/how-to-build-an-agent
> You can optimize agents all the way down your conveyor belts, but none of it will replace the discretionary management of a human.
Concur! My personal goal is to reduce the amount of times I need to get down into the weeds so I can focus more on spending time with my kids.
> Having a thousand automated AI robots to maintain your code may be no more useful than a thousand chimps and typewriters to help you write Shakespeare.
Perhaps; time will tell. There's classes of activities that I do every day that right now could be automated through agents running with unlimited tool calls. Think about topics like Renovatebot. Like, why don't we have Renovatebot for that class of KTLO?
- bigyabai 12 hours ago
  
  > Out of interest, have you built an agent?
  I wrote one with Google's BERT in 2020 because I was hair-on-fire ecstatic over the idea. Cargo-culted an inference library and hooked it into a Slack bot to post the changelogs. You can guess how that turned out, but yes, at one point I shared the dream. Nothing I've seen has motivated me to try again, the pace of Claude and ChatGPT releases haven't inspired me to try again.
  My worry is that you're getting too hyped up when there's not really any serious evidence the issues can be solved. It would be cool if it did, but again, refer to the flying car - great dream, but avgas isn't getting any cheaper. Nor pilots insurance.
  > Like, why don't we have Renovatebot for that class of KTLO?
  Liability? If you don't keep the lights on, the business is critically impaired. AI agents doesn't use the right address when sending the power bill - cute error in testing, catastrophic error in real life. How do we, as engineers, realistically stop an AI from doing that? How can we introduce heuristic variability without opening avenues for catastrophic, unfixable failure? You might be throwing developers under the bus by advocating for them too strongly here. "Pushbutton idempotency" and "robot that cleans my room" is a square peg trying to fit in a round hole.
  
  2 replies →
AStonesThrow 13 hours ago

> The proof is[sic] in[sic] the pudding.
"The Fact-Checking of the Cheezburger Is In the Eeting of the Cheezburger" -- Ancho Pantsa
https://www.goodreads.com/quotes/886386-the-proof-of-the-pud...