Comment by emp17344
19 hours ago
Remember when ARC 1 was basically solved, and then ARC 2 (which is even easier for humans) came out, and all of the sudden the same models that were doing well on ARC 1 couldn’t even get 5% on ARC 2? Not convinced these benchmark improvements aren’t data leakage.
Look at the ARC site. The scores of these models is plotted against their "cost per task". All of these huge jumps come along with massive increases in cost per task. Including Gemini 3.1 Pro which increased by 4.2x
ARC 2 was made specifically to artificially lower contemporary LLM scores, therefore any kind of model improvements will have outsized effects
Also people use "saturated" too liberally. The top left corner 1 cent per task is saturated IMO. Since there are billions of people who would perfer to solve arc 1 tasks at 52 cents per task. Arc 2 a human would make thousands of dollars a day with 99.99% accuracy
How much do I get if I solve this? :D
https://arcprize.org/play
You are saying something interesting but too esoteric. Can you explain for beginners?
You could get rich by solving ARC 2 tasks yourself instead of forwarding the work to an LLM, given a client willing to pay LLM rate.