Comment by rahimnathwani

1 year ago

The model you were using was created by Qwen, and then finetuned for reasoning by Deepseek.

- Deepseek didn't design the model architecture

- Deepseek didn't collate most of the training data

- Deepseek isn't hosting the model

1 comment

rahimnathwani

Yes, 100%. However, the distilled models are still pretty good at sticking to their approach to censorship. I would assume that the behavior comes from their reasoning patterns and fine tuning data, but I could be wrong. And yes, DeepSeek’s hosted model has additional guardrails evaluating the output. But those aren’t inherent to the model itself.