Comment by pedrozieg

2 months ago

The interesting bit in the blog isn’t the 72.2% SWE-Bench Verified number, it’s their own human eval: Devstral 2 beats DeepSeek V3.2 in Cline-style workflows but still loses clearly to Claude Sonnet 4.5. That’s a nice reminder that “open SOTA” on a single benchmark doesn’t mean “best tool for the job” once you’re doing multi-step edits across a messy real repo.

What is a big deal here is the combination of licensing and packaging. A 123B dense code model under a permissive license plus an open-source CLI agent (Vibe) that already speaks ACP is basically a reference stack for “bring your own infra + agents” instead of renting someone else’s SaaS IDE. If that ecosystem hardens (Cline, Kilo, Vibe, etc.), the moat shifts from “we have the only good code model” to “we own the best workflows and integrations”, and that’s a game open models can realistically win.

0 comments

pedrozieg

No comments yet

Contribute on Hacker News ↗