Comment by uHuge
15 hours ago
Is there a way to replay the sequence of mails that came so that you can check out if cheaper models handle them just as well/safely?
15 hours ago
Is there a way to replay the sequence of mails that came so that you can check out if cheaper models handle them just as well/safely?
I'm surprised there are no security researchers that would pick up on this.
Take the same prompt and all incoming mails and run again through various existing models, even the simpler local ones. He now has a serious cross section of prompt injection ideas. This is a publication I would like to read!
For privacy reasons I understand the corpus might not get published. But for a research collaboration and safeguards (don't send automatic answers from each model you try)... why not?
It's possible. I implemented something similar when I figured out that batch processing contaminated the excercise.
Or check if the results are the same even with the same model