Comment by parkersweb
1 day ago
Maybe a naive question - but is it possible for an LLM to return only part of its system prompt but to claim it’s the full thing i.e give the illusion of transparency?
1 day ago
Maybe a naive question - but is it possible for an LLM to return only part of its system prompt but to claim it’s the full thing i.e give the illusion of transparency?
Yes, but in my experience you can always get the whole thing if you try hard enough. LLMs really want to repeat text they've recently seen.
There are people out there who are really good at leaking prompts, hence collections like this one: https://github.com/elder-plinius/CL4R1T4S