Comment by tgtweak
3 days ago
I think frontier models can do more with fewer tokens (and do the wrong thing far less often) than a "really fast" small model.
There are use cases for fast/ultrafast inferrence models - classifying text, scoring things, extracting information - but for coding and other knowledge tasks - you're not going to get to your solution faster at 16,000 tokens/s if the solution never comes (or is the wrong one).
No comments yet
Contribute on Hacker News ↗