Comment by ekidd
2 days ago
One easy way to test different models is purchase $20 worth of tokens from one of the Open Router-like sites. This will let you asks tons of questions and try out lots of models.
Realistically, the biggest models you can run at a reasonable price right now are quantized versions of things like the Qwen3 30B A3B family. A 4-bit quantized version fits in roughly 15GB of RAM. This will run very nicely on something like an Nvidia 3090. But you can also use your regular RAM (though it will be slower).
These models aren't competitive with GPT 5 or Opus 4.5! But they're mostly all noticeably better than GPT-4o, some by quite a bit. Some of the 30B models will run as basic agentic coders.
There are also some great 4B to 8B models from various organizations that will fit on smaller systems. A 8B model, for example, can be a great translator.
(If you have a bunch of money and patience, you can also run something like GPT OSS 120B or GLM 4.5 Air locally.)
I wrote https://tools.nicklothian.com/llm_comparator.html so you can compare different models.
OpenRouter gives you $10 credit when you sign up - stick your API key in and compare as many models as you want. It's all browser local storage.
> (If you have a bunch of money and patience, you can also run something like GPT OSS 120B or GLM 4.5 Air locally.)
Don't need patience for these, just money. A single RTX 6000 Pro runs those great and super fast.
Or a single AMD Strix Halo with lots of RAM, which could be had before the RAM crisis for ~1.5k eur.
Oh... 8 thousand of eurobucks for the thing.
Or 4 thousand for the NVIDIA RTX A6000 which also runs the 120b just fine (quantized).
> GPT OSS 120B
This one runs at perfectly servicable pace locally on a laptop 5090 with 64gb system ram with zero effort required. Just download ollama and select this model from the drop-down.
Or why not just buy a blackwell rack?
Runs everything today with bleeding edge performance.
Overall whats the difference between 8k or 30k?
/s
You jest, but there's a ton of people on /r/localLLaMA which have an RTX 6000 Pro. No one has a Blackwell rack.
As long as you have the money this hardware is easily accessible to normal people, unlike fancy server hardware.
1 reply →
This is the answer. There's a half dozen sites that let you run these models by the token, and actually $20 is excessive. $5 will get you a long long way.