Comment by c0brac0bra

10 months ago

What tasks have you found the 0.6B model useful for? The hallucination that's apparent during its thinking process put up a big red flag for me.

Conversely, the 4B model actually seemed to work really well and gave results comparable to Gemini 2.0 Flash (at least in my simple tests).

2 comments

c0brac0bra

SparkyMcUnicorn 10 months ago

You can use 0.6B for speculative decoding on the larger models. It'll speed up 32B, but slows down 30B-A3B dramatically.

omneity 10 months ago

It's okay for extracting simple things like addresses, or for formatting text with some input data, like a more advanced form of mail merge.

I haven't evaled these tasks so YMMV. I'm exploring other possibilities as well. I suspect it might be decent at autocomplete, and it's small enough one could consider finetuning it on a codebase.