Comment by mentos

19 days ago

I wonder if AI can fill that gap of high quality minimally biased moderator.

"You are an AI moderator for ___. The community values thoughtful, constructive, and respectful conversations. Your role is to review user comments and take appropriate actions, such as approving, flagging, or suggesting edits. You are tasked with ensuring comments adhere to the community guidelines, which include..."

Moderation systems, even with humans at the helm, are adversarial systems where people can, and will, push on what is allowed. An AI moderator that is as good as a human on a per message basis is still going to be played like a fiddle by an adversary that is interested enough.

Many a forum out there has collapsed because the moderators manage to decide something is fine when it keep losing them contributors. The why do we think the AI will do better?

  • I think you’re overestimating how much moderation it takes to keep a community whole. HN is dang and a handful of other moderators and things are stable. If you could have AI even approach 90% of that then it will truly solve problems.

    • I have yet to see an LLM reliably push back against anything firmly, so I don't know how this would work if the first time a user says the LLM is wrong, it apologizes for the confusion and flips its script.

      Also, LLM aren't unbiased, all data it trains on is biased one way or another. Ask any HR question and see for yourself how its answers lean to be HR BS that favours employers.

They will apply the patterns they've learned from the biased moderator actions in their training data, and the even more reinforced bias from their usual fine-tuning that improved their "safety" and crippled their ability to condone controversial statements.

  • So spin up your own forum and don't moderate it. Or spend some time (un-)finetuning an LLM moderator so you can talk about race or eugenics or whatever "exciting" controversial statements you want to talk about. Who cares.

Very easy to do an AI prompt injection attack if the AI is reading every one of the forum's comments.

  • Can have the AI just flag posts for a human to review in v1? Then as you refine the prompt injection detection can move to have the AI be autonomous?

    • There is no way to get rid of a prompt injection attack. There are always ways to convince the AI to do something else besides flagging a post even if that's its initial instruction.

      9 replies →

“Review this comment as if you are an AI clone of the moderator dang from Hackernews and select the appropriate function call to apply.”