Comment by thih9

3 months ago

> Half an hour and fifty dollars later, I realized I had spent fifty dollars on this, and that this was not sustainable because, if anything, the code was getting more and more buggy the more Claude fixed it.

Off topic, this has been my experience with AI so often that it prevents me from exploring AI uses more.

I liked Cursor’s “auto” plan but that now seems gone. I’d happily switch to a provider that offers a similarly “unlimited” usage.

It's difficult to offer unlimited usage of something that's so expensive to run. OP could have used a $20/month Claude or Cursor plan for "unlimited" usage within their quota had they been willing to use a different model than the $75/Million Token Opus 4

It is possible to radically increase your chances of success. You have to speak the LLM’s language, just like you write Java or Rust. But it doesn’t come with a language spec, so you get to figure it out by trial and error. And a model change means revisiting what works.

Lots of tips on how to do this out there but one thing I do is have it try, throw away everything it it did, and try again with a completely restated question based on the good bits in what it was able to produce.

E.g., if you ask for a web app that does X and it produces a working web app that doesn’t do X, throw that away and just ask for the web app scaffolding. You’ve still come out ahead even if you take over fully.

  • > Lots of tips on how to do this out there but one thing I do is have it try, throw away everything it it did, and try again with a completely restated question

    This is the thing that worries me about AI/LLMs and how people "profess they're actually really useful when you use them right": the cliff to figuring out if they're useful is vertical.

    "You’ve still come out ahead even if you take over fully."

    I just finished a weeklong saga of redoing a bunch of Claude's work because instead of figuring out how to properly codegen some files it just manually updated them and ignored a bunch of failing tests.

    With another human I can ask, "Hey, wtf were you thinking when you did [x]?" and peer into their mind-state. With Claude, it's time to stir the linear algebra again. How can I tell when I'm near a local or global maxima when all the prevailing advice is "I dunno man, just `git reset --hard origin/master` and start again but like, with different words I guess."

    We have studies that show people feel like they're more productive using AI while they're actually getting less done [1], and "throw away everything it did and try again" based on :sparkle: vibes :sparkle: is the state of the art on how to "actually" use this stuff, I just feel more and more skeptical.

    [1]: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...

For an update, Codex has been massively better for me. I should probably give it another go, especially now with Codex Max, as it has been much better at not getting stuck like this. Give it a go if you get the chance, although with the general caveat that people have massively different experiences with LLMs, for some reason.

I think maybe it has to do something with the prompting style, my hypothesis is that some people's prompting styles fit certain LLMs better. I don't know how else to explain the fact that my very experienced friends prefer Sonnet to Codex, for example, whereas I had the opposite experience.