Comment by lxgr
6 months ago
> It’s supported by your favorite tools including Hugging Face Transformers, llama.cpp, Google AI Edge, Ollama, MLX, and many others
Does anybody know how to actually run these using MLX? mlx-lm does not currently seem to support them, so I wonder what Google means exactly by "MLX support".
I had success using the models from lmstudio-community here.
https://huggingface.co/lmstudio-community
Thank you!
Do you know if these actually preserve the structure of Gemma 3n that make these models more memory efficient on consumer devices? I feel like the modified inference architecture described in the article is what makes this possible, but it probably needs additional software support.
But given that they were uploaded a day ago (together with the blog post), maybe these are actually the real deal? In that case, I wish Google could just link to these instead of to https://huggingface.co/mlx-community/gemma-3n-E4B-it-bf16.
Edit: Ah, these are just non-MLX models. I might give them a try, but not what I was looking for. Still, thank you!
That's a great question that is beyond my technical competency in this area, unfortunately. I fired up LM Studio when I saw this HN post, and saw it updated its MLX runtime [0] for gemma3n support. Then went looking for an MLX version of the model and found that one.
[0]: https://github.com/lmstudio-ai/mlx-engine