Comment by kittikitti
10 months ago
These benchmarks are not really representative of what agents are capable of. The slow process of checking the weather through UI elements is not a good use case which is non-peer reviewed paper showcases.
10 months ago
These benchmarks are not really representative of what agents are capable of. The slow process of checking the weather through UI elements is not a good use case which is non-peer reviewed paper showcases.
No comments yet
Contribute on Hacker News ↗