Comment by kla-s
2 months ago
Actually its a little more nuanced:
Operation Type|Mamba Complexity|Transformer Complexity
Training(per iteration)|O(L)|O(L^2)
Autoregressive Inference(per step)|O(T)|O(L)
Memory Requirements|O(C)|O(L)
Where: L stands for the sequence length. T denotes a fixed constant that accounts for compression and selection time in Mamba's autoregressive inference. C reflects the fixed size of the SSM (State Space Model) latent state in Mamba
No comments yet
Contribute on Hacker News ↗