Comment by 0xdeadf1sh

1 month ago

This can maybe work on a small 7b or 14b model, but >70b models are already pretty good at identifying prompt injections. You will probably need to use weird/out-of-distribution tokens (remember MagicKarp?).

0 comments