Comment by boroboro4
14 days ago
DeepSeek introduced novel experts training technique which increased experts specialization. For particular given domain their implementation tends to activate same experts between different tokens, which is kinda what you’re asking for!
No comments yet
Contribute on Hacker News ↗