Comment by jscheel
3 months ago
I was using one of the smaller models (7b), but I was able to bypass its internal censorship by poisoning its <think> section a bit with additional thoughts about answering truthfully, regardless of ethical sensitivities. Got it to give me a nice summarization of the various human rights abuses committed by the CPC.
The model you were using was created by Qwen, and then finetuned for reasoning by Deepseek.
- Deepseek didn't design the model architecture
- Deepseek didn't collate most of the training data
- Deepseek isn't hosting the model
Yes, 100%. However, the distilled models are still pretty good at sticking to their approach to censorship. I would assume that the behavior comes from their reasoning patterns and fine tuning data, but I could be wrong. And yes, DeepSeek’s hosted model has additional guardrails evaluating the output. But those aren’t inherent to the model itself.
Poisoning the censorship machine by truth, that is poetic.