← Back to context

Comment by jwr

18 hours ago

I'd genuinely like to know how these small models are useful for anyone. I've done a lot of experimenting, and anything smaller than 27B is basically unusable, except as a toy. All I can say for smaller models is that they sometimes produce good answers, which is not enough for anything except monkeying around.

I solved my spam problem with gemma3:27b-it-qat, and my benchmarks show that this is the size at which the current models start becoming useful.

Qwen2.5-VL 7B is pretty impressive at turning printed or handwritten maths lecture notes into Latex code, and is small enough to run slowly on a laptop without enough VRAM. Gemma3 4B was useless at this though, and got stuck in loops or tried to solve the maths problems instead of just converting the working to Latex (but it was much faster as it fit into VRAM).

It sounds like you're trying to use them like ChatGPT, but I think that's not what they're for.

I am sure as ideation devices these can work fine. I treat this more like basic infra. I would absolutely love the future where most phones have some small LLM built in, kind of like a base layer of infra

I use gemma3:1b model (well, gemma3n:e2b since today) to summarize articles in my RSS reader. Works extremely well for such a simple task and runs on CPU on my hetzner server, so I don't have to pay electricity bill for running it on GPU at home

There are use cases where even low accuracy could be useful. I can't predict future products, but here are two that are already in place today:

- On the keyboard on iphones some sort of tiny language model suggest what it thinks are the most likely follow up words when writing. You only have to pick a suggested next word if it matches what you were planning on typing.

- Speculative decoding is a technique which utilized smaller models to speed up the inference for bigger models.

I'm sure smart people will invent other future use cases too.

The best use case I've found for tiny models (<5bn params) as a reference tool for when I don't have WiFi. I've been using qwen on my MacBook Air as a replacement for Google while I'm writing code on flights. They work great for asking basic questions about syntax and documentation.

Tiny, 4b or less models are designed for finetuning for some narrow tasks; this way can outperform large commercial models for a tiny fraction of price. Also great for code autocomplete.

7b-8b are great coding assistants if all you need is dumb fast refactoring, that cannot quite be done with macros and standard editor functionality but still primitive, such as "rename all methods having at least one argument of type SomeType by prefixing their names with "ST_".

12b is a threshold where models start writing coherent prose such Mistral Nemo or Gemma 3 12b.