Comment by andreyf
3 days ago
Rumor has it that they weren't trained "from scratch" the was US would, i.e. Chinese labs benefitted from government "procured" IP (the US $B models) in order to train their $M models. Also understand there to be real innovation in the many-MoE architecture on top of that. Would love to hear a more technical understanding from someone who does more than repeat rumors, though.
No comments yet
Contribute on Hacker News ↗