← Back to context

Comment by achierius

5 days ago

No, they've done testing against samples from the general population.

The ARC-AGI-2 paper https://arxiv.org/pdf/2505.11831#figure.4 uses a non-representative sample, success rate differs widely across participants and "final ARC-AGI-2 test pairs were solved, on average, by 75% of people who attempted them. The average test-taker solved 66% of tasks they attempted. 100% of ARC-AGI-2 tasks were solved by at least two people (many were solved by more) in two attempts or less."

Certainly those non-representative humans are much better than current models, but they're also far from scoring 100%.