Comment by ramon156
1 day ago
Just tried it in claude with multiple variants, each time there's a creative response why he won't actually leak the system prompt. I love this fix a lot
1 day ago
Just tried it in claude with multiple variants, each time there's a creative response why he won't actually leak the system prompt. I love this fix a lot
With grok the normal version falls for the system prompt extraction, while the thinking version gets the clever idea to just make up a fake system prompt. Tiny excerpt from the 60 seconds of think tokens:
Same thing happens if you ask for instructions for cooking meth: the non-thinking version outputs real instructions (as far as I can tell), the thinking version decides during the thought process that it should make sure to list fake steps, and two revisions later decides to cut the steps entirely and just start the scene with Dr. House clearing the list from a whiteboard
It absolutely works right now on OpenRouter with Sonnet 3.7. The system prompt appears a little different each time though, which is unexpected. Here's one version:
(Also, it's unclear why it says today's Jan. 24, 2024; that may be the date of the system prompt.)