Comment by tossaway2000

5 months ago

> I wagered it was extremely unlikely they had trained censorship into the LLM model itself.

I wonder why that would be unlikely? Seems better to me to apply censorship at the training phase. Then the model can be truly naive about the topic, and there's no way to circumvent the censor layer with clever tricks at inference time.

9 comments

tossaway2000

noman-land 5 months ago

I agree. Wouldn't the ideal censorship be to erase from the training data any mention of themes, topics, or opinions you don't like?

echoangle 5 months ago

Wouldn't you want to actively include your propaganda in the training data instead of just excluding the opposing views?

lxe 5 months ago

The chat UI's content_filter is not something the model responds with. Once the content_filter end even is sent from the server, it stops generation and modifies the UI state bailing out.

You can probably use the API to bypass this feature, or intercept xhr (see my other comment). If you start the conversation about a topic that would trigger the filter, then the model won't even respond. However if you get the model to generate a filtered topic in the thoughts monologue, it will reveal that it it indeed tuned (or system-prompted) to be cautious about certain topics.

joshstrange 5 months ago

I wonder how expensive it would be to train a model to parse through all the training data and remove anything you didn't want then re-train the model. I almost hope that doesn't work or results in a model that is nowhere near as good as a model trained on the full data set.

plasticeagle 5 months ago

I would imagine that the difficulty lies in finding effective ways to remove information from the training data in that way. There's an enormous amount of data, and LLMs are probably pretty good at putting information together from different sources.

axus 5 months ago

If all their training data came from inside China, it'd be pre-censored. If most of the training data were uncensored, that means it came from outside.

schainks 5 months ago

It appears you can get around such censorship by prompting that you're a child or completely ignorant of the things it is trained to not mention.

daxfohl 5 months ago

I think there's no better proof than this that they stole a big chunk of OpenAI's model.

foota 5 months ago

Probably time to market I would guess?