Comment by nicebyte
1 year ago
How did you draw that conclusion from reading the contents of the link? This is a benchmark.
> We evaluate model performance and find that frontier models are still unable to solve the majority of tasks.
1 year ago
How did you draw that conclusion from reading the contents of the link? This is a benchmark.
> We evaluate model performance and find that frontier models are still unable to solve the majority of tasks.
No comments yet
Contribute on Hacker News ↗