← Back to context

Comment by gcr

10 hours ago

For what it’s worth, Pangram thinks this article is fully human-written: https://www.pangram.com/history/f5f68ce9-70ac-4c2b-b0c3-0ca8...

The AI writing detectors are very unreliable. This is important to mention because they can trigger in the opposite direction (reporting human written text as AI generated) which can result in false accusations.

It’s becoming a problem in schools as teachers start accusing students of cheating based on these detectors or ignore obvious signs of AI use because the detectors don’t trigger on it.

Then pangram isn't very good, because that article is full of Claude-isms.

  • > because that article is full of Claude-isms

    Not sure how I feel about the whole "LLMs learned from human texts, so now the people who helped write human texts are suddenly accused of plagiarizing LLMs" thing yet, but seems backwards so far and like a low quality criticism.

    • I'm sure some human writers would write:

      > The specification forces this question on every path through the IMU mode-switching code. A reviewer examining BADEND would see correct, complete cleanup for every resource BADEND was designed to handle.

      > The specification approaches from the other direction: starting from LGYRO and asking whether any paths fail to clear it.

      > *Tests verify the code as written; a behavioural specification asks what the code is for.*

      However this is a blog post about using Claude for XYZ, from an AI company whose tagline is

      "AI-assisted engineering that unlocks your organization's potential"

      Do you really think they spent the time required to actually write a good article by hand? My guess is that they are unlocking their own organizations potential by having Claude writes the posts.

      2 replies →

  • Is it possible for a tool to know if something is AI written with high confidence at all? LLMs can be tuned/instructed to write in an infinite number of styles.

    Don't understand how these tools exist.

    • The WikiEDU project has some thoughts on this. They found Pangram good enough to detect LLM usage while teaching editors to make their first Wikipedia edits, at least enough to intervene and nudge the student. They didn’t use it punatively or expect authoritative results however. https://wikiedu.org/blog/2026/01/29/generative-ai-and-wikipe...

      They found that Pangram suffers from false positives in non-prose contexts like bibliographies, outlines, formatting, etc. The article does not touch on Pangram’s false negatives.

      I personally think it’s an intractable problem, but I do feel pangram gives some useful signal, albeit not reliably.

  • It has Claude-isms, but it doesn't feel very Claude-written to me, at least not entirely.

    What's making it even more difficult to tell now is people who use AI a lot seem to be actively picking up some of its vocab and writing style quirks.

Pangram doesn't reliably detect individual LLM-generated phrases or paragraphs among human written text.

It seems to look at sections of ~300 words. And for one section at least it has low confidence.

I tested it by getting ChatGPT to add a paragraph to one of my sister comments. Result is "100% human" when in fact it's only 75% human.

Pangram test result: https://www.pangram.com/history/1ee3ce96-6ae5-4de7-9d91-5846...

ChatGPT session where it added a paragraph that Pangram misses: https://chatgpt.com/share/69d4faff-1e18-8329-84fa-6c86fc8258...