Comment by onlyrealcuzzo

21 hours ago

1. Context is all you need... They are heavily investing in getting better context (especially for coding tasks). This will disproportionately advantage smaller models (and benefit everyone).

A smaller model with better context today can outperform a model with 100x more parameters with bad or diluted context.

2. MoE (already abundant) + MLA (mostly memory efficiency, not quality) + Medusa (speed, not quality) + GRAM (5000-10,000x better reasoning in an extremely small model) + 1.58b (unclear if it will have the impact Microsoft first claimed - but possibly 5x).