← Back to context

Comment by segmondy

3 days ago

garbage benchmark, inconsistent mix of "agent tools" and models. if you wanted to present a meaningful benchmark, the agent tools will stay the same and then we can really compare the models.

there are plenty of other benchmarks that disagree with these, with that said. from my experience most of these benchmarks are trash. use the model yourself, apply your own set of problems and see how well it fairs.

Which benchmarks are not garbage?

I don't consider myself super special. I think it should be doable to create a benchmark that beats me having to test every single new model.