Comment by mechagodzilla
2 months ago
I use a dual-socket 18-core (so 36 total) xeon with 768GB of DDR4, and get about 1.5-2 tokens/sec with a 4-bit quantized version of the full deepseek models. It really is wild to be able to run a model like that at home.
Dumb question: would something like this have a graphics card too? I assume not
Yeah, it was just a giant HP workstation - I currently have 3 graphics cards in it (but only 40GB total of VRAM, so not very useful for deepseek models).