Slacker News Slacker News logo featuring a lazy sloth with a folded newspaper hat
  • top
  • new
  • show
  • ask
  • jobs
Library
← Back to context

Comment by shikon7

14 hours ago

It seems to me that thinking models are harder to decensor, as they are trained to think whether to accept your request.

1 comment

shikon7

Reply

int_19h  8 hours ago

It goes both ways. E.g. unmodified thinking Qwen is actually easier to jailbreak to talk about things like Tiananmen by convincing it that it is unethical to refuse to do so.

Slacker News

Product

  • API Reference
  • Hacker News RSS
  • Source on GitHub

Community

  • Support Ukraine
  • Equal Justice Initiative
  • GiveWell Charities