Comment by insanitybit

7 hours ago

I think the fact that it would require someone to be "serious" is evidence of something at the very least.

Well, all the "trivial" and obvious jailbreaks haven't worked for years on the frontier models.

Also, the average person has no idea about the field of jailbreaking. It's like asking the average person to hack a random IP and expecting them to do it.

If you go and do your research on actual people who research jailbreaks and publish them, they are increasingly sophisticated and multistep, and unless you know this, you would have zero chance of just randomly jailbreaking Opus 4.8.

  • This starts to sound more like ‘social engineering a human assistant’, so there’s a degree of required specialization that does meaningfully increase costs.

  • I think a lot of sentiment online is that getting a model to do things it was instructed not to do is actually quite trivial.