Comment by 7777777phil

7 days ago

This is amazing, I just spent some time scrolling through these, most of the evolution is about inference cost not capability. GQA, MoE routing, sliding window attention, all trading theoretical capacity for practical efficiency.

Tbh might be the last generation of architectures designed entirely by humans. I dug into that (1) and might add another paragraph based on this if I find the time. The Big LLM Architecture Comparison (2) by Sebastian Raschka already inspired my ograph image for the blog -thanks again!

(1) https://philippdubach.com/posts/the-last-architecture-design...

(2) https://magazine.sebastianraschka.com/p/the-big-llm-architec...