Comment by bugglebeetle
5 hours ago
Unfortunately, this looks to only cover the larger MoE models. I imagine the smaller models are what most people would target. 9B just dropped two days ago, so not surprised it’s not explicitly documented, but does use a hybrid mamba architecture that I expect needs some special consideration.
No comments yet
Contribute on Hacker News ↗