Comment by sally_glance

2 months ago

They absolutely can do that if you give them the tools. Seeing Claude (I use it with opencode agents) run curl and playwright to verify and then fix it's implementation was a real 'wow' moment for me.

10 comments

sally_glance

Q6T46nT668w6i3m 2 months ago

We have different experiences. Often I’ll see Claude, et. al. find creative ways to fulfill the task without satisfying my intent, e.g., changing the implementation plan I specifically asked for, changing tolerances or even tests, and frequently disabling tests.

sally_glance 2 months ago

Yeah I feel that, if it happens your only way out is to write down a more extensive implementation plan first. For me that is the point where I start regretting to have tried implementing something using AI,.. But admittedly most of the time redacting the implementation plan and running the agent again is still faster than I could have done on my own (I try to make implementation tasks explicit in the form of a markdown file, worked pretty well so far).
Fr0styMatt88 2 months ago
I see these “you had a different experience than me” comments around AI coding agents a lot and can concur; I’ll have a different experience with Copilot from day-to-day even, sometimes it’s great and other days I give up on using it at all it’s being so bad.
Makes me honestly wonder — will AGI just give us agents that get into bad moods and not want to work for the day because they’re tired or just don’t feel like it!
- ssl-3 2 months ago
  
  If part of the goal is to emulate a person's abilities, then surely that includes a person's ability to fuck things up.
DANmode 2 months ago
Are you a customer?
- DANmode 2 months ago
  
  Don’t downvote because you don’t like the question.
  It obviously adds to the discussion: paid and non paid accounts are being conflated daily in threads like these!
  They’re not the same tier account!
  Free users, especially ones deemed less interesting to learn from for the future, are given table-scraps when they feel it’s necessary for load reasons.
  
  4 replies →