Comment by Retr0id
13 hours ago
The fact that LLMs are "smarter" is also their weakness. An oldschool classifier is far from foolproof, but you won't get past it by telling it about your grandma's bedtime story routine.
13 hours ago
The fact that LLMs are "smarter" is also their weakness. An oldschool classifier is far from foolproof, but you won't get past it by telling it about your grandma's bedtime story routine.
Fairly hard to bypass the latest LLMs with grandma's bedtime story these days, to be fair.
That specific trick yes, but the general concept still applies.
It does, but it's certainly not trivial. In fact there's an unclaimed $1000 bounty on prompt injecting OpenClaw: https://hackmyclaw.com/
2 replies →