Comment by 0xdeadbeefbabe 1 month ago Is anyone excited to do ablative testing on it? 1 comment 0xdeadbeefbabe Reply manbitesdog 1 month ago With such a high throughput because of sparsity, I'm particulary interested in distilling it into other architectures. I'd like to try a recurrent transformer when I have the time
manbitesdog 1 month ago With such a high throughput because of sparsity, I'm particulary interested in distilling it into other architectures. I'd like to try a recurrent transformer when I have the time
With such a high throughput because of sparsity, I'm particulary interested in distilling it into other architectures. I'd like to try a recurrent transformer when I have the time