Slacker News Slacker News logo featuring a lazy sloth with a folded newspaper hat
  • top
  • new
  • show
  • ask
  • jobs
Library
← Back to context

Comment by boroboro4

3 months ago

DeepSeek introduced novel experts training technique which increased experts specialization. For particular given domain their implementation tends to activate same experts between different tokens, which is kinda what you’re asking for!

0 comments

boroboro4

Reply

No comments yet

Contribute on Hacker News ↗

Slacker News

Product

  • API Reference
  • Hacker News RSS
  • Source on GitHub

Community

  • Support Ukraine
  • Equal Justice Initiative
  • GiveWell Charities