Comment by djsh
2 years ago
Since we are talking about throughput of API hosting providers, I wanted to add in the work we have done at Groq. I understand that the team is getting in touch with the ArtificialAnalysis folks to get benchmarked.
Mixtral running at >500 tokens/s @ Groq https://www.youtube.com/watch?v=5fJyOVtOk4Y Experience the speed for yourself, LLama2-70B, at https://chat.groq.com/
No comments yet
Contribute on Hacker News ↗