Comment by Palmik
6 hours ago
All evals on Terminal Bench require some harness. :) Or "Agent", as Terminal Bench calls it. Presumably the Gemini 3 are using Gemini CLI.
What do you mean by "standard eval harness"?
6 hours ago
All evals on Terminal Bench require some harness. :) Or "Agent", as Terminal Bench calls it. Presumably the Gemini 3 are using Gemini CLI.
What do you mean by "standard eval harness"?
No comments yet
Contribute on Hacker News ↗