Comment by syntaxing
12 hours ago
Devstral is purely nonthinking too it’s very possible it uses less models (I don’t know how DS 3.2 nonthinking compares). It’s interesting because Qwen pretty much proved hybrid models work worse than fully separate models.
No comments yet
Contribute on Hacker News ↗