Comment by syntaxing
13 hours ago
Devstral is purely nonthinking too it’s very possible it uses less models (I don’t know how DS 3.2 nonthinking compares). It’s interesting because Qwen pretty much proved hybrid models work worse than fully separate models.
No comments yet
Contribute on Hacker News ↗