Comment by 0xdeadf1sh
2 hours ago
This can maybe work on a small 7b or 14b model, but >70b models are already pretty good at identifying prompt injections. You will probably need to use weird/out-of-distribution tokens (remember MagicKarp?).
2 hours ago
This can maybe work on a small 7b or 14b model, but >70b models are already pretty good at identifying prompt injections. You will probably need to use weird/out-of-distribution tokens (remember MagicKarp?).
No comments yet
Contribute on Hacker News ↗