Comment by ants_everywhere
3 months ago
You'll probably have to think carefully about anti-abuse protection.
A great deal of LLM-generated content shows up in comments on social media. That's going to be hard to classify with a system like this and it will get harder as time goes on.
Another interesting trend is false accusations of LLM use as a form of attack.
Unlike other user-report detection (e.g. medical misinformation), this swims in the same direction as most AI misinformation. User-reported detection is typically going against the stream of misinformation by countering coordinated campaigns and pointing the user to a verifiable base truth. In this case there's no easy way to verify the truth. And the big state actors who are known to use LLMs in misinformation campaigns are battling the US for AI supremacy and so have an incentive to attack the US on AI since it's currently in the lead.
Especially if you're relying on volunteers, this seems prone to abuse in the same way, e.g. Reddit mods are. Thankless volunteer jobs that allow changing the conversation are going to invite misinformation farms or LLM farms to become enthusiastic contributors.
> A great deal of LLM-generated content shows up in comments on social media.
True, but going after classifying the source (user's commenting patterns) is a better signal than the content itself.
That said, for us (Kagi) it's a touchy area to, say, label reddit comments as slop/bots. There's no doubt we could do it better than reddit (their whole comment history is only 6TB compressed) but I doubt *reddit* would be pleased at that.
And it's a growing issue for product recommendation searches -- see [1] at last section for example on how astroturfed reddit comments on product questions trickle up to search engine results.
> Another interesting trend is false accusations of LLM use as a form of attack.
Fair again, but the question of AI slop is much more about "who is using the tool how" than the content of the output itself.
Also we're looking to stay conservative. False negatives > false positives in this space.
> And the big state actors who are known to use LLMs in misinformation campaigns are battling the US for AI supremacy and so have an incentive to attack the US on AI since it's currently in the lead.
Not wrong, we're especially going after the deluge of low effort slop, and cleaning up the internet for our users.
Highly sophisticated attacks are likely to evade detection.
> Especially if you're relying on volunteers, this seems prone to abuse in the same way, e.g. Reddit mods are.
The human labelling/review aspect is expected to stay small and from trusted users.
The reporting is wide scale, but review is and will remain closed trust based group.
[1] https://housefresh.com/beware-of-the-google-ai-salesman/