← Back to context

Comment by michaelt

6 months ago

Open-weights LLMs provide a dizzying array of options.

You'd have Llama, Mistral, Gemma, Phi, Yi.

You'd have Llama, Llama 2, Llama 3, Llama 3.2...

And those offer with 8B, 13B or 70B parameters

And you can get it quantised to GGUF, AWQ, exl2...

And quantised to 2, 3, 4, 6 or 8 bits.

And that 4-bit quant is available as Q4_0, Q4_K_S, Q4_K_M...

And on top of that there are a load of fine-tunes that score better on some benchmarks.

Sometimes a model is split into 30 files and you need all 30, other times there's 15 different quants in the same release and you only need a single one. And you have to download from huggingface and put the files in the right place yourself.

ollama takes a lot of that complexity and hides it. You run "ollama run llama3.1" and the selection and download all gets taken care of.