Comment by notepad0x90

3 months ago

I wish a smarter person would research or comment on this theory I have: Training a model to measure the entropy of human generated content vs LLM generated content might be the best approach to detecting LLM generated content.

Consider the "will smith eating spaghetti test", if you compare the entropy (not similarity) between that and will smith actually eating spaghetti, I naively expect the main difference would be entropy. when we say something looks "real" I think we're just talking about our expectation of entropy for that scene. An LLM can detect that it is a person eating a spaghetti see what the entropy is compared to the entropy it expects for the scene based on its training. In other words, train a model with specific entropy measurements along side actual training data.

12 comments

notepad0x90

drdaeman 3 months ago

That's basically how "AI detectors" work, they're just ML models trained to classify human- vs LLM-generated content apart. As we all (hopefully) know, despite provider claims, they don't really work any well.

Semaphor 3 months ago

In a non-adversial context (so when the author isn't disclosing it, but also not actively trying to hide it), AI image detection is giving me great results.
I think (currently) the problems are more about text, or post processing of other media to hide AI.
VHRanger 3 months ago

Correct, hence slopstop leveraging other signals than just the content

Animats 3 months ago

Something like that would probably work for six months. This is going to be like CAPCHAs. Schools have been trying to do this for essays for years. They're failing. The machines will win.

delves

fnord

raincole 3 months ago

It might work for real photos vs AI-gen photos, but I really don't see how 'entropy' is so important when distinguish human-gen text from Ai-gen text.

I also don't see why AI can't be trained to fool this detection.

VHRanger 3 months ago

There's already methods that attempt that.

It works for images because diffusion models leave artifacts, but doesn't work so well for text.

Text is an incredibly information dense data format. The diffusion artifacts kind of sneaks into the "extra data" in an image.

The other part is that GPT style models are effectively explicitly trained to minimize that entropy you're mentioning.

veunes 3 months ago

The idea is interesting, but it's still operating within the content analysis paradigm. As soon as entropy-based detectors become popular, the next generation of LLMs will be specifically fine-tuned to generate higher-entropy text to evade them.

It's a cat-and-mouse game where the generator will always be one step ahead. It's far more robust to analyze things that are hard to fake at scale: domain age, anomalous publication frequency, and unnatural link structures

raxxorraxor 3 months ago

I doubt AI slob is the solution of AI slob, far too error prone. Problem is we already had a slob advertising/attention economy, AI just made the problem more visible.

Any AI model can easily increase entropy by adding info bits and we would have a weird AI info war where people will become victims. If you consume info we deal with unknown spaghetti. Generating false info is too easy for a model.

throwaway2037 3 months ago

    > Consider the "will smith eating spaghetti test"

I thought this was a casual joke... then I Googled it. Yep, it's real: Consider the "will smith eating spaghetti test"

throwaway2037 3 months ago

I cannot edit my original post, but I meant to include the Wiki link: https://en.wikipedia.org/wiki/Will_Smith_Eating_Spaghetti_te...

Grosvenor 3 months ago

That's basically the entire idea behind GANs - Generative AI.

nalekberov 3 months ago

That would flag poorly encoded videos too.

Another problem is AI generators will try to find “workaround”s to bypass this system. In theory sounds good, in practice I doubt it would work.