Comment by falcor84
16 hours ago
I found Terminal-Bench [0] to be the most relevant for me, even for tasks that go far outside the terminal. It's been very interesting to see tools climb up there, and it matches my own experimentation, that they generally get the most out of Sonnet (and even those that use a mix of models like Warp, typically default to Sonnet).
No comments yet
Contribute on Hacker News ↗