Scout outperforms llama 3.1 405b and Gemini Flash 2.0 lite and it's MoE so as fast as a 17B model. That's pretty crazy.
It means you can run it on a high-ram apple silicon and it's going to be insanely fast on groq (thousands of tokens per second). Time to first token will bottleneck the generation.
Scout outperforms llama 3.1 405b and Gemini Flash 2.0 lite and it's MoE so as fast as a 17B model. That's pretty crazy.
It means you can run it on a high-ram apple silicon and it's going to be insanely fast on groq (thousands of tokens per second). Time to first token will bottleneck the generation.