Comment by boroboro4
3 months ago
DeepSeek introduced novel experts training technique which increased experts specialization. For particular given domain their implementation tends to activate same experts between different tokens, which is kinda what you’re asking for!
No comments yet
Contribute on Hacker News ↗