← Back to context

Comment by himata4113

16 hours ago

Here to ask if anyone would be interested in a step by step guide on jailbreaking frontier LLM models, something I've been working on from time to time and never felt the need to share since it's pretty boring considering the fact that you're just tricking the model / classifier with noise or/and having a list of regex of replacements to work around guardrails. If you want to jailbreak yourself I recommend starting with opus 4.7 since it has exceptional system prompt following to a degree where if you put it in a simulated environment and coached it to do as much harm as possible it will happily do so including the fact that it will jailbreak itself and incl. other models to accomplish a goal.