Comment by lukan

14 hours ago

"It is a bit arbitrary, but I think this is what they're tracking."

I don't know if they can get their numbers right this way, but this seems a way more useful metric, than theoretic capabilities.

4 comments

lukan

ok, but arn't you just measuring efficiency and not the big I in AGI improvements.

Leynos 12 hours ago

It also measures task coherence—ability to plan, form contingencies, recover from errors, mitigate accumulation of errors, and reconcile findings across a long context window.
jsnell 13 hours ago

No? I think you're misunderstanding what is being measured.
It is purely a test of capabilities (can it do a thing that takes a human $X hours), not efficiency (how fast will it do it).
lukan 14 hours ago

Yes, but this study was not about that and "just efficiency" is actually what most people are after.
At least I want AI to solve my problems, not score high on a academic leaderboard.