Comment by cubefox
1 month ago
> This model is based on the TurboS fast-thinking base, the world's first ultra-large-scale Hybrid-Transformer-Mamba MoE large model released by us at the beginning of March.
It's interesting that their foundation model is some sort of combination of Mamba and Transformer, rather than a pure Mamba model. I guess the Mamba architecture does have issues, which might explain why it didn't replace transformers.
No comments yet
Contribute on Hacker News ↗