Comment by cubefox

1 month ago

> This model is based on the TurboS fast-thinking base, the world's first ultra-large-scale Hybrid-Transformer-Mamba MoE large model released by us at the beginning of March.

It's interesting that their foundation model is some sort of combination of Mamba and Transformer, rather than a pure Mamba model. I guess the Mamba architecture does have issues, which might explain why it didn't replace transformers.