Comment by uHuge

15 hours ago

Is there a way to replay the sequence of mails that came so that you can check out if cheaper models handle them just as well/safely?

3 comments

uHuge

schobi 14 hours ago

I'm surprised there are no security researchers that would pick up on this.

Take the same prompt and all incoming mails and run again through various existing models, even the simpler local ones. He now has a serious cross section of prompt injection ideas. This is a publication I would like to read!

For privacy reasons I understand the corpus might not get published. But for a research collaboration and safeguards (don't send automatic answers from each model you try)... why not?

cuchoi 9 hours ago

It's possible. I implemented something similar when I figured out that batch processing contaminated the excercise.

croes 15 hours ago

Or check if the results are the same even with the same model