Comment by TheDudeMan

4 months ago

Mamba is O(n). But I guess it has other drawbacks.

3 comments

TheDudeMan

kla-s 4 months ago

Actually its a little more nuanced:

Operation Type|Mamba Complexity|Transformer Complexity

Training(per iteration)|O(L)|O(L^2)

Autoregressive Inference(per step)|O(T)|O(L)

Memory Requirements|O(C)|O(L)

Where: L stands for the sequence length. T denotes a fixed constant that accounts for compression and selection time in Mamba's autoregressive inference. C reflects the fixed size of the SSM (State Space Model) latent state in Mamba

Per: https://github.com/state-spaces/mamba/issues/196

DrNosferatu 4 months ago

Yes, in Mamba accuracy seems to goes down and has trouble in exact token recall. But, I would say it might be good for very power efficient edge deployment, and ultra long contexts.

imtringued 4 months ago

Mamba is solving a different problem than transformers.
What Mamba does is take an initial state s_0 and an input u_0, to produce a new state s_1 and an output o_1. It's basically modeling a very complicated state machine. I can easily think of half a dozen applications where this is exactly what you want and it is better than transformers, but LLMs are not among them. Essentially most control problems boil down to what Mamba does. In fact, I would say that Mamba as an architecture is probably the non-plus ultra for modeling mechanical system dynamics.