Comment by bermudi
19 hours ago
Source? The most trusted benchmark right now (deepSWE) scores better or just as well on their minimal harness than when using CC or codex
19 hours ago
Source? The most trusted benchmark right now (deepSWE) scores better or just as well on their minimal harness than when using CC or codex
deepSWE clearly doesn't need complex tool calling?