Comment by metawake

1 year ago

I made a small project (https://github.com/metawake/puppetry-detector) to detect this type of LLM policy manipulation. It's an early idea using a set of regexp patterns (for speed) and a couple of phases of text analysis. I am curious if it's any useful, I created integration with Rebuff (loss security suite) just in case.

0 comments