← Back to context Comment by cyanydeez 15 hours ago ok, but arn't you just measuring efficiency and not the big I in AGI improvements. 3 comments cyanydeez Reply Leynos 13 hours ago It also measures task coherence—ability to plan, form contingencies, recover from errors, mitigate accumulation of errors, and reconcile findings across a long context window. jsnell 15 hours ago No? I think you're misunderstanding what is being measured.It is purely a test of capabilities (can it do a thing that takes a human $X hours), not efficiency (how fast will it do it). lukan 15 hours ago Yes, but this study was not about that and "just efficiency" is actually what most people are after.At least I want AI to solve my problems, not score high on a academic leaderboard.
Leynos 13 hours ago It also measures task coherence—ability to plan, form contingencies, recover from errors, mitigate accumulation of errors, and reconcile findings across a long context window.
jsnell 15 hours ago No? I think you're misunderstanding what is being measured.It is purely a test of capabilities (can it do a thing that takes a human $X hours), not efficiency (how fast will it do it).
lukan 15 hours ago Yes, but this study was not about that and "just efficiency" is actually what most people are after.At least I want AI to solve my problems, not score high on a academic leaderboard.
It also measures task coherence—ability to plan, form contingencies, recover from errors, mitigate accumulation of errors, and reconcile findings across a long context window.
No? I think you're misunderstanding what is being measured.
It is purely a test of capabilities (can it do a thing that takes a human $X hours), not efficiency (how fast will it do it).
Yes, but this study was not about that and "just efficiency" is actually what most people are after.
At least I want AI to solve my problems, not score high on a academic leaderboard.