Comment by empiko
2 months ago
In some mixture-of-experts approaches, samples or tokens are being distributed among experts. The experts are selected by trying to predict what is a good expert-sample match. Depending on your neighbors in the batch, you might be assigned different experts.
No comments yet
Contribute on Hacker News ↗