← Back to context

Comment by sally_glance

1 day ago

They absolutely can do that if you give them the tools. Seeing Claude (I use it with opencode agents) run curl and playwright to verify and then fix it's implementation was a real 'wow' moment for me.

We have different experiences. Often I’ll see Claude, et. al. find creative ways to fulfill the task without satisfying my intent, e.g., changing the implementation plan I specifically asked for, changing tolerances or even tests, and frequently disabling tests.

  • Yeah I feel that, if it happens your only way out is to write down a more extensive implementation plan first. For me that is the point where I start regretting to have tried implementing something using AI,.. But admittedly most of the time redacting the implementation plan and running the agent again is still faster than I could have done on my own (I try to make implementation tasks explicit in the form of a markdown file, worked pretty well so far).

  • I see these “you had a different experience than me” comments around AI coding agents a lot and can concur; I’ll have a different experience with Copilot from day-to-day even, sometimes it’s great and other days I give up on using it at all it’s being so bad.

    Makes me honestly wonder — will AGI just give us agents that get into bad moods and not want to work for the day because they’re tired or just don’t feel like it!

    • If part of the goal is to emulate a person's abilities, then surely that includes a person's ability to fuck things up.

  • Are you a customer?

    • Don’t downvote because you don’t like the question.

      It obviously adds to the discussion: paid and non paid accounts are being conflated daily in threads like these!

      They’re not the same tier account!

      Free users, especially ones deemed less interesting to learn from for the future, are given table-scraps when they feel it’s necessary for load reasons.

      4 replies →