Comment by anupj
19 hours ago
AI agent benchmarks are starting to feel like the self-driving car demos of 2016: impressive until you realize the test track has speed bumps labeled "success"
19 hours ago
AI agent benchmarks are starting to feel like the self-driving car demos of 2016: impressive until you realize the test track has speed bumps labeled "success"
No comments yet
Contribute on Hacker News ↗