Comment by meroes

10 hours ago

What is this supposed to show exactly? Those books have been feed into LLMs for years and there's even likely specific RLHF's on extracting spells from HP.

4 comments

meroes

muzani 10 hours ago

There was a time when I put the EA-Nasir text into base64 and asked AI to convert it. Remarkably it identified the correct text but pulled the most popular translation of the text than the one I gave it.

majewsky 7 hours ago

Sucks that you got a really shitty response to your prompt. If I were you, the model provider would be receiving my complaint via clay tablet right away.

rvz 10 hours ago

> What is this supposed to show exactly?

Nothing.

You can be sure that this was already known in the training data of PDFs, books and websites that Anthropic scraped to train Claude on; hence 'documented'. This is why tests like what the OP just did is meaningless.

Such "benchmarks" are performative to VCs and they do not ask why isn't the research and testing itself done independently but is almost always done by their own in-house researchers.

jaco6 1 hour ago

[dead]