Comment by wrxd

14 hours ago

I'm not sure I understand how this work https://huggingface.co/google/gemma-4-E4B-it-assistant has 78.8M parameters while the standard variant https://huggingface.co/google/gemma-4-E4B-it has 8B parameters.

Is gemma-4-E4B-it-assistant a model I can use stand-alone or a model I need to use in combination with gemma-4-E4B-it?

2 comments

wrxd

gunalx 14 hours ago

You need the regular gemma model as well. You can think of this as a really small distillation of the original. Useless by its own because it often is wrong, but it is fifth more than not. And because verifying a transformer model can be done faster than running it. We can effectively speed up by using this draft model and only doing the compute where it was wrong.

This is a oversimplification, but tldr you need both yes.

wrxd 14 hours ago

Thank you!
I already played with Gemma4 on oMLX a while ago. When I have some time I'll check if it supports running MTP models and play a bit more