← Back to context Comment by esafak 6 days ago What do you mean? It tests whether the model knows the tools and uses them. 3 comments esafak Reply YetAnotherNick 6 days ago Yeah it's a knowledge benchmark not agentic benchmark. esafak 6 days ago That's like saying coding benchmarks are about memorizing the language syntax. You have to know what to call when and how. If you get the job done you win. YetAnotherNick 6 days ago I am saying the opposite. If a coding benchmark just tests the syntax of a esoteric language, it shouldn't be called coding benchmark.For a benchmark named terminal bench, I would assume it would require some terminal "interaction", not giving the code and command.
YetAnotherNick 6 days ago Yeah it's a knowledge benchmark not agentic benchmark. esafak 6 days ago That's like saying coding benchmarks are about memorizing the language syntax. You have to know what to call when and how. If you get the job done you win. YetAnotherNick 6 days ago I am saying the opposite. If a coding benchmark just tests the syntax of a esoteric language, it shouldn't be called coding benchmark.For a benchmark named terminal bench, I would assume it would require some terminal "interaction", not giving the code and command.
esafak 6 days ago That's like saying coding benchmarks are about memorizing the language syntax. You have to know what to call when and how. If you get the job done you win. YetAnotherNick 6 days ago I am saying the opposite. If a coding benchmark just tests the syntax of a esoteric language, it shouldn't be called coding benchmark.For a benchmark named terminal bench, I would assume it would require some terminal "interaction", not giving the code and command.
YetAnotherNick 6 days ago I am saying the opposite. If a coding benchmark just tests the syntax of a esoteric language, it shouldn't be called coding benchmark.For a benchmark named terminal bench, I would assume it would require some terminal "interaction", not giving the code and command.
Yeah it's a knowledge benchmark not agentic benchmark.
That's like saying coding benchmarks are about memorizing the language syntax. You have to know what to call when and how. If you get the job done you win.
I am saying the opposite. If a coding benchmark just tests the syntax of a esoteric language, it shouldn't be called coding benchmark.
For a benchmark named terminal bench, I would assume it would require some terminal "interaction", not giving the code and command.