Comment by tgtweak

2 months ago

dual 3090's (24GB each) on 8x+8x pcie has been a really reliable setup for me (with nvlink bridge... even though it's relatively low bandwidth compared to tesla nvlink, it's better than going over pcie!)

48GB of vram and lots of cuda cores, hard to beat this value atm.

If you want to go even further, you can get an 8x V100 32GB server complete with 512GB ram and nvlink switching for $7000 USD from unixsurplus (ebay.com/itm/146589457908) which can run even bigger models and with healthy throughput. You would need 240V power to run that in a home lab environment though.

V100 is outdated (no bf16, dropped in CUDA 13) and power hungry (8 cards 3 years continuous use are about $12k of electricity).

  • Depends where you are plugging them in - but yes they are older gen - despite this, 8xV100 will outperform most of what you can buy for that price simply by way of memory and nvlink bandwidth. If you want to practically run a local model that takes 200GB of memory (Devstral-2-123B-Instruct-2512 for example or GPT-OSS-120B with long context window) without resorting to aggressive ggufs or memory swapping, you don't have many cheaper options. You can also parallelize several models on one node to get some additional throughput for bulk jobs.