← Back to context

Comment by Palmik

10 hours ago

All evals on Terminal Bench require some harness. :) Or "Agent", as Terminal Bench calls it. Presumably the Gemini 3 are using Gemini CLI.

What do you mean by "standard eval harness"?

I think the point is that it looks like Gemini 3 was only tested with the generic "Terminus 2", whereas Codex was tested with the Codex CLI.