Comment by singron

7 months ago

"Avg. Mturker" has 77% on ARC1 and costs $3/task. "Stem Grad" has 98% on ARC1 and costs $10/task. I would love a segment like "typical US office employee" or something else in between since I don't think you need a stem degree to do better than 77%.

It's also worth noting the "Human Panel" gets 100% on ARC2 at $17/task. All the "Human" models are on the score/cost frontier and exceptional in their score range although too expensive to win the prize obviously.

I think the real argument is that the ARC problems are too abstract and obscure to be relevant to useful AGI, but I think we need a little flexibility in that area so we can have tests that can be objectively and mechanically graded. E.g. "write a NYT bestseller" is an impractical test in many ways even if it's closer to what AGI should be.

1 comment

singron

tbrownaw 7 months ago

> I think the real argument is that the ARC problems are too abstract and obscure to be relevant to useful AGI

I think it's meant to work like how getting things off the top shelf at the supermarket isn't relevant to playing basketball.