Comment by throwaway0123_5

5 months ago

I'm curious why there are no results for the "Claude 3.7 Extended Thinking" on SWE-Bench and Agentic tool use.

Are you finding that extended thinking helps a lot when the whole problem can be posed in the prompt, but that it isn't a major benefit for agentic tasks?

It would be a bit surprising, but it would also mirror my experiences, and the benchmarks which show Claude 3.5 being better at agentic tasks and SWE tasks than all other models, despite not being a reasoning model.