Comment by imtringued
7 days ago
Mamba isn't really a competitor to transformers. Quadratic attention exists for a reason.
Mamba's strengths lie in being a better RNN as you said. Mamba is probably better than transformers for things like object permanence over a sequence of inputs, where each input is an image, for example.
However, it would still make sense for a transformer to actually process the image by cutting it up into patches and then performing quadratic attention on that and then feeding the transformer input into mamba to get the actual output e.g. a robot action while maintaining object permanence.
No comments yet
Contribute on Hacker News ↗