Comment by oliveiracwb
16 hours ago
With the advent of MoEs, efficiency gains became possible. However, MoEs still operate far from the balance and stability of dense models. My view is that most progress comes from router tuning based on good and bad outcomes, with only marginal gains in real intelligence
No comments yet
Contribute on Hacker News ↗