← Back to context

Comment by jakkos

1 day ago

Any time I've tried an "abliterated" model, heretic or other, it has always damaged the capabilities of the original model and will still often refuse or produce garbage at a lot of "unsafe" requests.

Abliteration can't teach the model something that wasn't in pre-training, it's just fixing refusals from post-training. I don't find the delta to be that big in practice and it really depends on what you're doing with the models anyway. If your primary usecase is sexy roleplay I think the loss of absolute capability is probably worth the abliteration, for malware research it's probably better to just jailbreak.

I've mostly found that finetunes and abliterations are of limited use but that's recently changed for me. My default model for the past week or so has been a Qwen 3.6 tuned on Opus 4.7, it's definitely a bit worse than the base Qwen in terms of precision and "intelligence", but it MORE than makes up for it in response style. Way easier to get it to write things that I want to read, it's way more terse, way fewer emoji. Best local rubber duck by far.

There are many abliterations which work quite well. Older techniques do suffer from quality issues, but more recent ones do a much better job. In particular, the older approaches did poorly on MoE models.

Another likely problem you're running into: the problems with older techniques compound with quantization. Anything less than 5-bit quant is going to give you some pretty sketchy outputs, in my experience.