Comment by meroes
10 hours ago
What is this supposed to show exactly? Those books have been feed into LLMs for years and there's even likely specific RLHF's on extracting spells from HP.
10 hours ago
What is this supposed to show exactly? Those books have been feed into LLMs for years and there's even likely specific RLHF's on extracting spells from HP.
There was a time when I put the EA-Nasir text into base64 and asked AI to convert it. Remarkably it identified the correct text but pulled the most popular translation of the text than the one I gave it.
Sucks that you got a really shitty response to your prompt. If I were you, the model provider would be receiving my complaint via clay tablet right away.
> What is this supposed to show exactly?
Nothing.
You can be sure that this was already known in the training data of PDFs, books and websites that Anthropic scraped to train Claude on; hence 'documented'. This is why tests like what the OP just did is meaningless.
Such "benchmarks" are performative to VCs and they do not ask why isn't the research and testing itself done independently but is almost always done by their own in-house researchers.
[dead]