Comment by nh43215rgb

10 days ago

270M is nice (and rare) addition. Is there a reason why this is not categorized as gemma3n model? I thought small models go under gemma3n category

3 comments

nh43215rgb

rao-v 10 days ago

Not at Google (anymore), but Gemma3n is a radically different (and very cool) architecture. The MatFormer approach essentially lets you efficiently change how many parameters of the model you use while inferencing. The 2B model they released is just the sub model embedded in the original 4B model. You can also fiddle with the model and pull a 2.5 or 3B version pu also!

This is a more traditional LLM architecture (like the original Gemma 3 4B but smaller) and trained on an insane (for the size) number of tokens.

nh43215rgb 10 days ago
oh ok thank you. so something like MoE? That might not be so correct but at least the models need different architecture(MatFormer) to be classified under gemma3n.
- canyon289 10 days ago
  
  Its not an MOE, its what's referred to as a dense architecture, same as the Gemma3 models (But not 3n as noted)