Comment by bob1029
8 days ago
I was just about to post this. There was a MLST podcast about it a few days ago:
https://www.youtube.com/watch?v=8u2pW2zZLCs
Lots of related papers referenced in the description.
8 days ago
I was just about to post this. There was a MLST podcast about it a few days ago:
https://www.youtube.com/watch?v=8u2pW2zZLCs
Lots of related papers referenced in the description.
One claim from that podcast was that the xLSTM attention mechanism is (in practical implementation) more efficient than (transformer) flash attention, and therefore promises to significantly reduces the time/cost of test-time compute.
Test it out here:
https://github.com/NX-AI/mlstm_kernels
https://huggingface.co/NX-AI/xLSTM-7b