Comment by Alifatisk
6 months ago
I think it's because of a combination between the MoE model architecture and the inference done in large batches and run in parallel
6 months ago
I think it's because of a combination between the MoE model architecture and the inference done in large batches and run in parallel
No comments yet
Contribute on Hacker News ↗