Comment by antidumbass
3 months ago
> I'm pretty sure it was running locally.
If this family member is experimenting with DeepSeek locally, they are an extremely unusual person and have spent upwards of $10,000 if not $200,000. [0]
> ...partially print the word, then in response to a trigger delete all the tokens generated to date and replace them...
It was not running locally. This is classic bolt-on censorship behavior. OpenAI does this if you ask certain questions too.
If everyone keeps loudly asking these questions about censorship, it seems inevitable that the political machine will realize weights can't be trivially censored. What will they do? Start imprisoning anyone who releases non-lobotomized open models. In the end, the mob will get what it wants.
[0] I am extremely surprised that a 15-year-long HN user has to ask this question, but you know what they say: the future is not fairly distributed.
I ran the 32b parameter model just fine on my rig an hour ago with a 4090 and 64gig of ram. It’s high end for the consumer scene but still solidly within consumer prices
I'm confused. According to other comment: https://news.ycombinator.com/item?id=42859645, <= 70b DeepSeek models are just a fine tuning of Llama or Qwen? So we shouldn't take any thought of these models to actually being DeepSeek.
I think people are confusing the smaller non-DeepSeek original models (Qwen/Llama) with the 700B DeepSeek R1 model being talked about in here and that very few people can run locally.
I run the 32b parameter model also just fine on our 4x H100 rig :) It's good enough for embedding, our use-case.
I'm not sure if $200k of hardware fits the consumer level
I have also been running the 32b version on my 24GB RTX 3090.
if someone wants to run the real thing (R1) locally, someone posted their hardware specs on X. Total cost: $6,000.
[0] direct link with login https://x.com/carrigmat/status/1884244369907278106
[1] alt link without login https://threadreaderapp.com/thread/1884244369907278106.html
That's not DeepSeek, it's a Qwen or Llama model distilled from DeepSeek. Not the same thing at all.
I am doing the same.
You can run the quantized versions of DeepSeek locally with normal hardware just fine, even with very good performance. I have it running just now. With a decent consumer gaming GPU you can already get quite far.
It is quite interesting that this censorship survives quantization, perhaps the larger versions censor even more. But yes, there probably is an extra step that detects "controversial content" and then overwrites the output.
Since the data feeding DeepSeek is public, you can correct the censorship by building your own model. For that you need considerably more compute power though. Still, for the "small man", what they released is quite helpful despite the censorship.
At least you can retrace how it ends up in the model, which isn't true for most other open weight models, that cannot release their training data due to numerous reasons beyond "they don't want to".
> extremely unusual person and have spent upwards of $10,000
This person doesn't have the budget, but does have the technical chops to the level of "extremely unusual". I'll have to get them to teach me more about AI.
>. they are an extremely unusual person and have spent upwards of $10,000
eh? doesn't the distilled+quantized version of the model fit on a high-end consumer grade gpu?
The "distilled+quantized versions" are not the same model at all, they are existing models (Llama and Qwen) finetuned on outputs from the actual R1 model, and are not really comparable to the real thing.
That is semantics and they are strongly comparable with their input and output. Distillation is different to finetuning.
Sure, you could say that only running the 600+b model is running "the real thing"...
a distilled version running on another model architecture does not count as using "DeepSeek". It counts as running a Llama:7B model fine-tuned on DeepSeek.
That’s splitting hairs. Most people refer to running locally as in running model on your hardware rather than the providing company.
3 replies →
Pretty sure this is just layman vs academic expert usage of the word conflicting.
For everyone who doesn’t build LLMs themselves, “running a Llama:7B model fined-tuned on DeepSeek.” _is_ using Deepseek mostly on account of all the tools and files being named DeepSeek and the tutorials that are aimed as casual users all are titled with equivalents of “How to use DeepSeek locally”
2 replies →