← Back to context

Comment by mechagodzilla

2 months ago

I use a dual-socket 18-core (so 36 total) xeon with 768GB of DDR4, and get about 1.5-2 tokens/sec with a 4-bit quantized version of the full deepseek models. It really is wild to be able to run a model like that at home.

Dumb question: would something like this have a graphics card too? I assume not

  • Yeah, it was just a giant HP workstation - I currently have 3 graphics cards in it (but only 40GB total of VRAM, so not very useful for deepseek models).