Comment by kla-s

2 months ago

Actually its a little more nuanced:

Operation Type|Mamba Complexity|Transformer Complexity

Training(per iteration)|O(L)|O(L^2)

Autoregressive Inference(per step)|O(T)|O(L)

Memory Requirements|O(C)|O(L)

Where: L stands for the sequence length. T denotes a fixed constant that accounts for compression and selection time in Mamba's autoregressive inference. C reflects the fixed size of the SSM (State Space Model) latent state in Mamba

Per: https://github.com/state-spaces/mamba/issues/196