Comment by danielhanchen

6 months ago

Made some GGUFs if anyone wants to run them!

./llama.cpp/llama-cli -hf unsloth/gemma-3n-E4B-it-GGUF:UD-Q4_K_XL -ngl 99 --jinja --temp 0.0

./llama.cpp/llama-cli -hf unsloth/gemma-3n-E2B-it-GGUF:UD-Q4_K_XL -ngl 99 --jinja --temp 0.0

I'm also working on an inference + finetuning Colab demo! I'm very impressed since Gemma 3N has audio, text and vision! https://docs.unsloth.ai/basics/gemma-3n-how-to-run-and-fine-...

10 comments

danielhanchen

magicalhippo 6 months ago

Tried the E4B model in Ollama and it's totally broken when interpreting images. The output depends only on the text and is consistent in that way, but otherwise completely wrong.

Works fine with regular Gemma 3 4B, so I'll assume it's something on Ollama's side. edit: yep, text-only for now[1], would be nice if that was a bit more prominent than burried in a ticket...

Don't feel like compiling llama.cpp myself, so I'll have to wait to try your GGUFs there.

[1]: https://github.com/ollama/ollama/issues/10792#issuecomment-3...

danielhanchen 6 months ago

Oh I don't think multimodal works yet - it's text only for now!

upghost 6 months ago

Literally was typing out "Unsloth, do your thing!!" but you are way ahead of me. You rock <3 <3 <3

Thank you!

danielhanchen 6 months ago

:) Thanks!

bilsbie 6 months ago

Thanks! What kind of rig do I need?

jszymborski 6 months ago

Likely nothing crazy. My RTX 2080 is pumping out 45 tok/s.

knowaveragejoe 6 months ago

What is `jinja` in this context?

Tostino 6 months ago

The chat template is stored as a Jinja template.
gowld 6 months ago

https://jinja.palletsprojects.com/en/stable/