Comment by pertymcpert
1 year ago
Indeed. I wonder what the architecture for Claude and Grok3 is. If they're still dense models was the MoE excitement with R1 was a tad premature...
1 year ago
Indeed. I wonder what the architecture for Claude and Grok3 is. If they're still dense models was the MoE excitement with R1 was a tad premature...
No comments yet
Contribute on Hacker News ↗