Comment by bermudi
18 hours ago
Source? The most trusted benchmark right now (deepSWE) scores better or just as well on their minimal harness than when using CC or codex
18 hours ago
Source? The most trusted benchmark right now (deepSWE) scores better or just as well on their minimal harness than when using CC or codex
deepSWE clearly doesn't need complex tool calling?