Slacker News Slacker News logo featuring a lazy sloth with a folded newspaper hat
  • top
  • new
  • show
  • ask
  • jobs
Library
← Back to context

Comment by mkw5053

4 months ago

I'm interested in how this would work for generative models. It's not obvious how you'd implement causal masking in the frequency domain. And the modReLU activation seems critical but adds implementation complexity. Would love to see how this scales on truly massive context lengths where the theoretical advantages should really shine.

0 comments

mkw5053

Reply

No comments yet

Contribute on Hacker News ↗

Slacker News

Product

  • API Reference
  • Hacker News RSS
  • Source on GitHub

Community

  • Support Ukraine
  • Equal Justice Initiative
  • GiveWell Charities