The ARC-AGI-2 paper https://arxiv.org/pdf/2505.11831#figure.4 uses a non-representative sample, success rate differs widely across participants and "final ARC-AGI-2 test pairs were solved, on average, by 75% of people who attempted them. The average test-taker solved 66% of tasks they attempted. 100% of ARC-AGI-2 tasks were solved by at least two people (many were solved by more) in two attempts or less."
Certainly those non-representative humans are much better than current models, but they're also far from scoring 100%.
The ARC-AGI-2 paper https://arxiv.org/pdf/2505.11831#figure.4 uses a non-representative sample, success rate differs widely across participants and "final ARC-AGI-2 test pairs were solved, on average, by 75% of people who attempted them. The average test-taker solved 66% of tasks they attempted. 100% of ARC-AGI-2 tasks were solved by at least two people (many were solved by more) in two attempts or less."
Certainly those non-representative humans are much better than current models, but they're also far from scoring 100%.
The original ARC-AGI test was much easier than the recent v2.