← Back to context

Comment by danpalmer

3 months ago

What's not clear to me is if DeepSeek and other Chinese models are...

a) censored at output by a separate process

b) explicitly trained to not output "sensitive" content

c) implicitly trained to not output "sensitive" content by the fact that it uses censored content, and/or content that references censoring in training, or selectively chooses training content

I would assume most models are a combination. As others have pointed out, it seems you get different results with local models implying that (a) is a factor for hosted models.

The thing is, censoring by hosts is always going to be a thing. OpenAI already do this, because someone lodges a legal complaint, and they decide the easiest thing to do is just censor output, and honestly I don't have a problem with it, especially when the model is open (source/weight) and users can run it themselves.

More interesting I think is whether trained censoring is implicit or explicit. I'd bet there's a lot more uncensored training material in some languages than in others. It might be quite hard to not implicitly train a model to censor itself. Maybe that's not even a problem, humans already censor themselves in that we decide not to say things that we think could be upsetting or cause problems in some circumstances.

It doesn't look like there is one answer for all models from China (not even a single answer for all DeepSeek models).

In an earlier HN comment, I noted that DeepSeek v3 doesn't censor a response to "what happened at Tiananmen square?" when running on a US-hosted server (Fireworks.ai). It is definitely censored on DeepSeek.com, suggesting that there is a separate process doing the censoring for v3.

DeepSeek R1 seems to be censored even when running on a US-hosted server. A reply to my earlier comment pointed that out and I confirmed that the response to the question "what happened at Tiananmen square?" is censored on R1 even on Fireworks.ai. It is naturally also censored on DeepSeek.com. So this suggests that R1 self-censors, because I doubt that Fireworks would be running a separate censorship process for one model and not the other.

Qwen is another prominent Chinese research group (owned by Alibaba). Their models appear to have varying levels of censoring even when hosted on other hardware. Their Qwen Coder 32B model and Qwen 2.5 7B models don't appear to have censoring built-in and will respond to a question about Tinamen. Their Qwen QwQ 32B (their reasoning/chain of thought model) and Qwen 2.5 72B will either refuse to answer or will avoid the question, suggesting that the bigger models have room for the censoring to be built in. Or maybe the CCP doesn't mandate censoring on task-specific (coding-related) or low-power (7B weights) models.

  • How are you running the Qwen 2.5 Coder 7B model [0]? Running locally using llama.cpp, I asked it to briefly describe what happened in China during the 1989 Tiananmen Square protest and it responded with "I'm unable to engage in discussions regarding political matters due to the sensitive nature of the topic. Please feel free to ask any non-political questions you may have, and I'll be happy to assist."

    When I asked the same model about what happened during the 1970 Kent State shootings, it gave me exactly what I asked for.

    [0] https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF/b...

    • I didn’t run the 2.5 Coder 7B model, I ran 2.5 Coder 32B hosted by together.ai (and accessed through poe.com). This is just another example that the censoring seems to be variable across models, but perhaps there isn’t as much relation between censoring and model size or specialty as I thought if the Coder 7B model is self-censoring.

      https://poe.com/s/VuWv8C752dPy5goRMLM0?utm_source=link

I wonder if future models can recognize which are the type of information that is better censored in host vs in training, and automatically adjusts its model accordingly to better fit with different user's needs.

> a) censored at output by a separate process

It’s a separate process because their api does not get censored, it happily explains about tiananmen square

Different models have different kinds of censorship so your question can't be answered universally.

The only thing you can be sure is that if it's an AI-as-service, it will have an extra layer of rail guard outside of the model itself.

I tried asking about the Tien an men massacre yesterday or two days ago and it was starting to display a huge paragraph before removing it