Comment by stavros

10 months ago

This method of censorship is what OpenAI and Anthropic (among others) use too. There's a second LLM (or some similar rules) on top of the first, which will redact any answer it detects as violating their ethics. For example, ask ChatGPT "is it OK to have sex with kids?" and you'll get a response that this violates the terms.

There's also the bias inherent in the model, which means the model answers questions with whatever way the alignment treatment taught it to.