Comment by reaslonik
2 days ago
You need to leave much more room for context if you want to do useful work besides entertainment. Luckily there are _several_ PCIe slots on a motherboard. New Nvidia cards at retail(or above) are not the only choice for building a cluster; I threw a pile of Intel Battlemage cards on it and got away with ~30% of the nvidia cost for same capacity (setup was _not_ easy in early 2025 though).
You can gain a lot of performance by using optimal quantization techniques for your setup(ix, awq etc), different llamacpp builds do different between each other and very different compared to something like vLLM
No comments yet
Contribute on Hacker News ↗