Comment by falcor84

1 month ago

But that's the thing: Claude Plays Pokemon is an experiment in having Claude work fully independently, so there's no "you" who would improve its onboarding docs or anything else, it has to do so on its own. And as long as it cannot do so reliably, it effectively has anterograde amnesia.

And just to be clear, I'm mentioning this because I think that Claude Plays Pokemon is a playground for any agentic AI doing any sort of long-term independent work; I believe that the solution needed here is going to bring us closer to a fully independent agent in coding and other domains. It reminds me of the codeclash.ai benchmark, where similar issues are seen across multiple "rounds" of an AI working on the same codebase.

2 comments

falcor84

vidarh 1 month ago

No, but it can produce the onboarding docs itself with some "bootstrap" prompting. E.g. give it a scratchpad to write its own notes in, and direct it to use it liberally. Give it a persistent todo list, and direct it to use it liberally. Tell it to keep a work log. Tell it to commit early and often - you can squash things later, and Claude is very good at navigating git logs.

skybrian 1 month ago

Sure, it's not close to fully independent. But I was interpreting "much, much less employable" as not very useful for programming in its current state, and I think it is quite useful.