← Back to context

Comment by 7thpower

1 year ago

The LLMs have the ‘knowledge’ baked in, one of the things you will hear about are quantized models with lower precision (think 16-bit -> 4-bit) weights, which enables them to be run on greater variety of hardware and/or with greater performance.

When you quantize, you sacrifice model performance. In addition, a lot of the models favored for local use are already very small (7b, 3b).

What OP is pointing out is that you can actually run the full deepseek r1 model, along with all of the ‘knowledge’ on relatively modest hardware.

Not many people want to make that tradeoff when there are cheap, performant APIs around but for a lot of people who have privacy concerns or just like to tinker, it is pretty big deal.

I am far removed from having a high performance computer (although I suppose my MacBook is nothing to sneeze at), but I remember building computers or homelabs back in the day and then being like ‘okay now what is the most stressful workload I can find?!’ — this is perfect for that.