Comment by pflenker

16 hours ago

For a game like anchorhead, which is famous in its niche, shouldn’t Claude already know it sufficiently to just solve it right away? I would expect that its data source contained multiple discussions and walkthroughs of the game.

I expect it's somewhere in the training data, but it's very unlikely to be salient. A few textfiles here and there in the ocean of the Internet is nothing. If Claude had memorized the walkthrough, it would have performed better.

I would think so. I'd be far more interested in a comparison of LLMs (no internet search allowed) playing against IF games released in the past month.

It's very likely the model didn't stop to question if the game they were playing was something they knew already, and just assumed it was a puzzle created for it.

  • You can see Claude's responses in the repo. The first one is:

    Ah, Anchorhead! One of the most celebrated pieces of interactive fiction ever written