Comment by emp17344

19 hours ago

Remember when ARC 1 was basically solved, and then ARC 2 (which is even easier for humans) came out, and all of the sudden the same models that were doing well on ARC 1 couldn’t even get 5% on ARC 2? Not convinced these benchmark improvements aren’t data leakage.

5 comments

emp17344

culi 15 hours ago

Look at the ARC site. The scores of these models is plotted against their "cost per task". All of these huge jumps come along with massive increases in cost per task. Including Gemini 3.1 Pro which increased by 4.2x

casey2 16 hours ago

ARC 2 was made specifically to artificially lower contemporary LLM scores, therefore any kind of model improvements will have outsized effects

Also people use "saturated" too liberally. The top left corner 1 cent per task is saturated IMO. Since there are billions of people who would perfer to solve arc 1 tasks at 52 cents per task. Arc 2 a human would make thousands of dollars a day with 99.99% accuracy

z3t4 15 hours ago

How much do I get if I solve this? :D
https://arcprize.org/play
alisonkisk 15 hours ago
You are saying something interesting but too esoteric. Can you explain for beginners?
- louiskottmann 6 hours ago
  
  You could get rich by solving ARC 2 tasks yourself instead of forwarding the work to an LLM, given a client willing to pay LLM rate.