Slacker News Slacker News logo featuring a lazy sloth with a folded newspaper hat
  • top
  • new
  • show
  • ask
  • jobs
Library
← Back to context

Comment by mcrutcher

19 hours ago

MoE models are pretty poorly named since all the "experts" are "the same". They're probably better described as "sparse activation" models. MoE implies some sort of "heterogenous experts" that a "thalamus router" is trained to use, but that's not how they work.

0 comments

mcrutcher

Reply

No comments yet

Contribute on Hacker News ↗

Slacker News

Product

  • API Reference
  • Hacker News RSS
  • Source on GitHub

Community

  • Support Ukraine
  • Equal Justice Initiative
  • GiveWell Charities