Comment by rahimnathwani
1 year ago
Wow! Only $2k with no quantization.
hit between 4.25 to 3.5 TPS (tokens per second) on the Q4 671b full model
1 year ago
Wow! Only $2k with no quantization.
hit between 4.25 to 3.5 TPS (tokens per second) on the Q4 671b full model
I think it is quantised, they actually said no distillation.
I think you're right. The instructions say
This will pull down 400GB: https://ollama.com/library/deepseek-r1:671b
But the Huggingface repo has 163 files of ~4.3GB each, so around 700GB: https://huggingface.co/deepseek-ai/DeepSeek-R1/tree/main