Comment by NitpickLawyer
14 days ago
For single user, maybe. But for small teams GPUs are still the only available option, when considering t/s and concurrency. Nvidia's latest 6000pro series are actually reasonably priced for the amount of vram / wattage you get. A 8x box starts at 75k eur and can host up to DS3 / R1 / Llama4 in 8bit with decent speeds, context and concurrency.
What teams bother to do that, though? It's easier to call an API or spin up a cloud cluster.