Comment by notepad0x90
3 months ago
I wish a smarter person would research or comment on this theory I have: Training a model to measure the entropy of human generated content vs LLM generated content might be the best approach to detecting LLM generated content.
Consider the "will smith eating spaghetti test", if you compare the entropy (not similarity) between that and will smith actually eating spaghetti, I naively expect the main difference would be entropy. when we say something looks "real" I think we're just talking about our expectation of entropy for that scene. An LLM can detect that it is a person eating a spaghetti see what the entropy is compared to the entropy it expects for the scene based on its training. In other words, train a model with specific entropy measurements along side actual training data.
That's basically how "AI detectors" work, they're just ML models trained to classify human- vs LLM-generated content apart. As we all (hopefully) know, despite provider claims, they don't really work any well.
In a non-adversial context (so when the author isn't disclosing it, but also not actively trying to hide it), AI image detection is giving me great results.
I think (currently) the problems are more about text, or post processing of other media to hide AI.
Correct, hence slopstop leveraging other signals than just the content
Something like that would probably work for six months. This is going to be like CAPCHAs. Schools have been trying to do this for essays for years. They're failing. The machines will win.
delves
fnord
It might work for real photos vs AI-gen photos, but I really don't see how 'entropy' is so important when distinguish human-gen text from Ai-gen text.
I also don't see why AI can't be trained to fool this detection.
There's already methods that attempt that.
It works for images because diffusion models leave artifacts, but doesn't work so well for text.
Text is an incredibly information dense data format. The diffusion artifacts kind of sneaks into the "extra data" in an image.
The other part is that GPT style models are effectively explicitly trained to minimize that entropy you're mentioning.
The idea is interesting, but it's still operating within the content analysis paradigm. As soon as entropy-based detectors become popular, the next generation of LLMs will be specifically fine-tuned to generate higher-entropy text to evade them.
It's a cat-and-mouse game where the generator will always be one step ahead. It's far more robust to analyze things that are hard to fake at scale: domain age, anomalous publication frequency, and unnatural link structures
I doubt AI slob is the solution of AI slob, far too error prone. Problem is we already had a slob advertising/attention economy, AI just made the problem more visible.
Any AI model can easily increase entropy by adding info bits and we would have a weird AI info war where people will become victims. If you consume info we deal with unknown spaghetti. Generating false info is too easy for a model.
I thought this was a casual joke... then I Googled it. Yep, it's real: Consider the "will smith eating spaghetti test"
I cannot edit my original post, but I meant to include the Wiki link: https://en.wikipedia.org/wiki/Will_Smith_Eating_Spaghetti_te...
That's basically the entire idea behind GANs - Generative AI.
That would flag poorly encoded videos too.
Another problem is AI generators will try to find “workaround”s to bypass this system. In theory sounds good, in practice I doubt it would work.