Comment by mschild
12 hours ago
Running models locally is surprisingly easy and possible even on older hardware.
Obviously not the largest, up-to-date models but for what I expect most people use them for, even on hn, there are some shockingly good models that dont require €4k machines.
I have a desktop with an AMD 6900XT and 5600 with 32GB ram. Obviously no slouch but its several years old at this point. I can comfortably run qwen 3.5 9b and get a speedy 60 token/sec output with decent results.
idk I can barely field a 14b on my desktop, and it’s rough trying to replicate the agentic pair programming experience I’m accustomed to with Claude. And I don’t mean it doesn’t work as well, I mean it doesn’t work.
Is there some secret I’m missing? I’ve tried rolling my own harness, and tried a few of the ones the cool kids use - I think pi was the most recent. Not quite my tempo, I’m afraid.
Depends on your desktop specs and specific model.
The easiest way I have found is to use LM Studio, grab the model you want, and point whatever tooling you're using at the local exposed API.
You will have to configure the model params (temperature, etc) a bit to get the style you're expecting but it works decently well for me.
[dead]