Tell HN: Pangram is easily-defeatable with Claude
5 days ago
I use Pangram all of the time to detect whether what I'm reading is fully AI-generated or not. Today, I wondered how easy it was to defeat it.
All I had to do was show them that the text they generated was detected as 100% AI generated to get them to generate a "human-sounding" text snippet.
Claude Sonnet: https://claude.ai/share/28080c8c-5647-43df-9671-91c9f9e46791
Interestingly, ChatGPT 5.4 won't do it, at least not with the default model it uses: https://chatgpt.com/share/69c6c713-038c-832e-86be-689abd7b7ae1. I'm guessing it can be jailbroken to do it though.
Pangram, and any tool like it, faces an uphill battle that's ultimately unwinnable. It assumes a static definition of "AI-generated" content, but as yen223 pointed out, there's no inherent "humanity" quality that can't be mimicked. This cat-and-mouse game between detectors and generators will heavily favor the generators. We've seen this play out in other security domains; the attacker always evolves faster than the defender.
This is a losing game for Pangram
There's no special "humanity" quality in text. If a human can write it, there's no reason a sufficiently strong pattern matcher can't write it too.
Using an LLM to prove text is human by asking another LLM to sound human feels like two mirrors facing each other. You mostly learn how good the illusion has gotten.
This will be an ongoing battle for a long time. Happy you were able to get them to make the change.