Comment by phreeza

2 years ago

It seems so trivial to prevent this prompt leaking with just a regexp check on the output that I find it really hard to believe.

The LLM could simply re-phrase it, write it in Chinese, or print it in Morse Code. Regex is useless against a technology like GPT-4.

There is nothing secret to hide, what would be the purpose of blocking it?

  • 1. It could help competitors improve their alternatives

    2. It could be used against them in a lawsuit, so they would probably want to keep it hidden until force to reveal it (which they would likely fight against)

    3. It gives more information to the people crafting "jailbreaks"

    4. It might create a backlash, considering how heavy-handed the "please make images diverse" part of it is

    5. It might create ANOTHER backlash, on the other side of that coin, for not being heavy-handed enough, and not explicitly listing an ethnicity, gender, or whatever other personal characteristic that some might want ChatGPT to represent by default in its prompts

  • Just as an example, it would make it easier to craft adversarial attacks to generate undesired behavior.