Comment by aspenmayer
5 days ago
The page that the person you’re replying to does have this so it may not be updated, or they were looking in the wrong place originally, or both:
> Industry Track Awards
> Best Paper
> Speed Without Sacrifice: Fine-Tuning Language Models with Medusa and Knowledge Distillation in Travel Applications
> Daniel Zagyva, Emmanouil Stergiadis, Laurens van der Maas, Aleksandra Dokic, Eran Fainman, Ilya Gusev, Moran Beladev
Per TFA, the paper we’re looking for is this one:
> Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
> Jingyang Yuan, Huazuo Gao, Damai Dai, Junyu Luo, Liang Zhao, Zhengyan Zhang, Zhenda Xie, Y. X. Wei, Lean Wang, Zhiping Xiao, Yuqing Wang, Chong Ruan, Ming Zhang, Wenfeng Liang, Wangding Zeng
I’m not finding it by author on the page you linked but I think it’s this reference by title:
> DeepSeek × PKU × UW — Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
I did find it on this page:
No comments yet
Contribute on Hacker News ↗