← Back to context

Comment by simonw

7 days ago

How hard have you tried?

I've been finding that the Opus 4.5/4.6 and GPT-5.2/5.3 models really have represented a step-change in how good they are at running long tasks.

I can one-shot prompt all sorts of useful coding challenges now that previously I would have expected to need multiple follow-ups to fix mistakes the agents made.

I got all of this from a single prompt, for example: https://github.com/simonw/research/tree/main/cysqlite-wasm-w... - including this demo page: https://simonw.github.io/research/cysqlite-wasm-wheel/demo.h... - using this single prompt: https://github.com/simonw/research/pull/79

What do you mean? The generated script just downloads the sources and runs pyodide: https://github.com/simonw/research/blob/main/cysqlite-wasm-w...

There is maybe 5 relevant lines in the script and nothing complex at all that would require to run for days.

  • Maybe so, but I did once spend 12 hours straight debugging an Emscripten C++ compiler bug! (After spending the first day of the jam setting up Emscripten, and the second day getting Raylib to compile in it. Had like an hour left to make the actual game, hahah.)

    I am a bit thick with such things, but just wanted to provide the context that Emscripten can be a fickle beast :)

    I sure am glad I can now deploy Infinite Mechanized Autistic Persistence to such soul-crushing tasks, and go make a sandwich or something.

    (The bug turned out to be that if I included a boolean in a class member, the whole game crashed, but only the Emscripten version. Sad. Ended up switching back to JS, which you basically need anyway for most serious web game dev.)

How do you deal with the cost associated with a long running opus session? I asked it to validate some JSON configs against the spec yesterday and it burned $10 worth of tokens for what would have been a 1 millisecond linter task.

  • I'm on the $200/month Claude Max plan and I rarely run out of my token allowance.

    I'm also paying $20/month for OpenAI Codex and again it's rare I hit the rate limits there.

Can you share any examples of these one-shot prompts? I've not gotten to the point where I can get those kind of results yet.

  • If you look through the commit logs on simonw/research and simonw/tools on GitHub most commits should either list the prompt, link to a PR with the prompt or link to a session transcript.