Comment by jasonjmcghee

1 year ago

Scout outperforms llama 3.1 405b and Gemini Flash 2.0 lite and it's MoE so as fast as a 17B model. That's pretty crazy.

It means you can run it on a high-ram apple silicon and it's going to be insanely fast on groq (thousands of tokens per second). Time to first token will bottleneck the generation.