Comment by syntaxing
2 months ago
Devstral is purely nonthinking too it’s very possible it uses less models (I don’t know how DS 3.2 nonthinking compares). It’s interesting because Qwen pretty much proved hybrid models work worse than fully separate models.
No comments yet
Contribute on Hacker News ↗