← Back to context

Comment by mechagodzilla

5 months ago

If you don't need to pay for the model development costs, I think running inference will just be driven down to the underlying cloud computing costs. The actual requirement to passably (~4-bit quantization) run Deepseek v3/r1 at home is really just having 512GB or so of RAM - I bought a used dual-socket xeon for $2k that has 768GB of RAM, and can run Deepseek R1 at 1-1.5 tokens/sec, which is perfectly usable for "ask a complicated question, come back an hour or so later and check on the result".