Comment by manbitesdog
1 month ago
With such a high throughput because of sparsity, I'm particulary interested in distilling it into other architectures. I'd like to try a recurrent transformer when I have the time
1 month ago
With such a high throughput because of sparsity, I'm particulary interested in distilling it into other architectures. I'd like to try a recurrent transformer when I have the time
No comments yet
Contribute on Hacker News ↗