Comment by phreeza

2 years ago

It seems so trivial to prevent this prompt leaking with just a regexp check on the output that I find it really hard to believe.

6 comments

phreeza

oceanplexian 2 years ago

The LLM could simply re-phrase it, write it in Chinese, or print it in Morse Code. Regex is useless against a technology like GPT-4.

jerbear4328 2 years ago

OpenAI could also train their models against this easily, making it hard to get at the prompt. Yet, it's super easy, try it yourself:

https://chat.openai.com/share/94455782-5985-4b20-82fa-521f40...

I imagine OpenAI has no problem this, there are no secrets in the prompt, and it may be useful for prompt engineering. If it's harmless, no point in stopping the user from seeing it.

FergusArgyll 2 years ago

Me and many others have beaten this completely, give it a shot! https://gandalf.lakera.ai/

MrNeon 2 years ago

There is nothing secret to hide, what would be the purpose of blocking it?

guizzy 2 years ago

1. It could help competitors improve their alternatives
2. It could be used against them in a lawsuit, so they would probably want to keep it hidden until force to reveal it (which they would likely fight against)
3. It gives more information to the people crafting "jailbreaks"
4. It might create a backlash, considering how heavy-handed the "please make images diverse" part of it is
5. It might create ANOTHER backlash, on the other side of that coin, for not being heavy-handed enough, and not explicitly listing an ethnicity, gender, or whatever other personal characteristic that some might want ChatGPT to represent by default in its prompts
phreeza 2 years ago

Just as an example, it would make it easier to craft adversarial attacks to generate undesired behavior.