← Back to context

Comment by rvz

4 hours ago

> What is this supposed to show exactly?

Nothing.

You can be sure that this was already known in the training data of PDFs, books and websites that Anthropic scraped to train Claude on; hence 'documented'. This is why tests like what the OP just did is meaningless.

Such "benchmarks" are performative to VCs and they do not ask why isn't the research and testing itself done independently but is almost always done by their own in-house researchers.