← Back to context

Comment by halJordan

13 hours ago

Thats not true at all. All refusals mediate in the same direction. If you abliterate small "acceptable to you" refusals then you will not overcome all the refusals in the model. By targeting the strongest refusals you break those and the weaker ones like politics. By only targeting the weak ones, you're essentially just fine tuning on that specific behavior. Which is not the point of abliteration.