Comment by dalemhurley
3 days ago
1000 tokens/sec for a highly specialised model is where we are going to see agents requiring.
Dedicated knowledge, fast output, rapid iteration.
I have been trying out SMOL models as coding models don't need to the full corpus of human history.
My most recent build was good but too small.
I am thinking of a model that is highly tuned to coding and agentic loops.
No comments yet
Contribute on Hacker News ↗