Comment by mickdarling

3 months ago

I’m sure someone will correct me if I’m wrong, but doesn’t that mean that it is somewhat trivial to find all the censorship in the model and prune it to create an uncensored model? Just ask it for all of the things it cannot say, in hex, and reply, in hex.

That's not how the censorship works, the model doesn't know. Either some information has been excluded from training data set, some answers penalized in training or, most commonly, just another LLM/simple regex on the output that cuts out the response

  • I’m talking about pruning a local LLM not using their service. There are plenty of ways to prune and distill. Heck DeepSeek was distilled from other models. You could simply run a distillation using Hex, then convert those outputs back to the target language.