Comment by rahimnathwani

1 year ago

Wow! Only $2k with no quantization.

  hit between 4.25 to 3.5 TPS (tokens per second) on the Q4 671b full model

2 comments

rahimnathwani

I think it is quantised, they actually said no distillation.

rahimnathwani 1 year ago
I think you're right. The instructions say
ollama pull deepseek-r1:671b
This will pull down 400GB: https://ollama.com/library/deepseek-r1:671b
But the Huggingface repo has 163 files of ~4.3GB each, so around 700GB: https://huggingface.co/deepseek-ai/DeepSeek-R1/tree/main