Comment by ks2048

10 months ago

Part of the blog is hypothesizing that the censorship is in a separate filtering stage rather than the model itself. But, the example of hex encoding doesn't prove or disprove that at all, does it? Can't you just check on a version running open-source weights?

3 comments

ks2048

pomatic 10 months ago

The open source model seems to be uncensored, lending weight to the separate filter concept. Plus, any filter needs to be revised as new workarounds emerge - if it is baked in to the model that requires retraining, whereas it's reasonably light work for a frontend filter.

amrrs 10 months ago

I ran the distilled models locally some of the censorships are there.

But on their chat (hosted), deepseek has some keyword based filters - like the moment it generates Chinese president name or other controversial keywords - the "thinking" stops abruptly!

prettyblocks 10 months ago

The distilled versions I've run through Ollama are absolutely censored and don't even populate the <think></think> section for some of those questions.