Comment by pflenker

19 days ago

For a game like anchorhead, which is famous in its niche, shouldn’t Claude already know it sufficiently to just solve it right away? I would expect that its data source contained multiple discussions and walkthroughs of the game.

7 comments

pflenker

zetalyrae 19 days ago

I expect it's somewhere in the training data, but it's very unlikely to be salient. A few textfiles here and there in the ocean of the Internet is nothing. If Claude had memorized the walkthrough, it would have performed better.

vunderba 19 days ago

I would think so. I'd be far more interested in a comparison of LLMs (no internet search allowed) playing against IF games released in the past month.

ratg13 19 days ago

It's very likely the model didn't stop to question if the game they were playing was something they knew already, and just assumed it was a puzzle created for it.

sfjailbird 19 days ago

You can see Claude's responses in the repo. The first one is:
Ah, Anchorhead! One of the most celebrated pieces of interactive fiction ever written

brianjeong 18 days ago

You could say the same about Pokemon - the models still struggle quite a bit.

Jweb_Guru 19 days ago

Yeah, I do not find performances like this very impressive.

IgorPartola 19 days ago

Honestly I am curious how it would do if it did have a walkthrough.