Comment by jasonjmcghee
14 days ago
Scout outperforms llama 3.1 405b and Gemini Flash 2.0 lite and it's MoE so as fast as a 17B model. That's pretty crazy.
It means you can run it on a high-ram apple silicon and it's going to be insanely fast on groq (thousands of tokens per second). Time to first token will bottleneck the generation.
No comments yet
Contribute on Hacker News ↗