Comment by AYBABTME

11 hours ago

This right now today is making the case for OSS AI and local inference. 200$/m to get rate limited makes a RTX 6000 Pro look cheap.

8 comments

AYBABTME

How well do local OSS models stack up to Claude?

Balinares 10 hours ago

Very well for narrowly scoped purposes.
They decohere much faster as the context grows. Which is fine, or not, depending on whether you consider yourself a software engineer amplifying your output by automating the boilerplate, or an LLM cornac.
wongarsu 8 hours ago
Much better than they did half a year ago, but a single RTX 6000 won't get you there
Models in the 700B+ category (GLM5, Kimi K2.5) are decent, but running those on your own hardware is a six-figure investment. Realistic for a company, for a private person instead pick someone you like from openrouter's list of inference providers.
If you really want local on a realistic budget, Qwen 3.5 35B is ok. But not anywhere near Claude Opus
- Eisenstein 7 hours ago
  
  > but running those on your own hardware is a six-figure investment
  GLM-5 is a 744B MoE with 40B active. You can run a Q4_K_M quant on llama.cpp if you can afford 512GB of RAM. An RTX 6000 will help a lot with the prompt processing, and the generations with be relatively fast if you have decent memory bandwidth. llama.cpp's autofit feature is really good at dividing the layers for MoEs to max speed when offloading.
sunaookami 11 hours ago

They don't, only on meaningless benchmarks.

What’s the depreciation on that RTX 6000 though?

New hardware keeps on coming with large gains in performance.