Comment by atleastoptimal

7 months ago

On OpenAI's own released papers they show Anthropic's models performing better than their own. They tend to be pretty transparent and reliable in honesty in their benchmarks.

The thing is, only leading AI companies and big tech have the money to fund these big benchmarks and run inference on them. As long as the benchmarks are somewhat publicly available and vetted by reputable scientists/mathematicians it seems reasonable to believe they're trustworthy.