Comment by c0brac0bra
10 months ago
What tasks have you found the 0.6B model useful for? The hallucination that's apparent during its thinking process put up a big red flag for me.
Conversely, the 4B model actually seemed to work really well and gave results comparable to Gemini 2.0 Flash (at least in my simple tests).
You can use 0.6B for speculative decoding on the larger models. It'll speed up 32B, but slows down 30B-A3B dramatically.
It's okay for extracting simple things like addresses, or for formatting text with some input data, like a more advanced form of mail merge.
I haven't evaled these tasks so YMMV. I'm exploring other possibilities as well. I suspect it might be decent at autocomplete, and it's small enough one could consider finetuning it on a codebase.