Comment by mentos

10 months ago

I wonder if AI can fill that gap of high quality minimally biased moderator.

"You are an AI moderator for ___. The community values thoughtful, constructive, and respectful conversations. Your role is to review user comments and take appropriate actions, such as approving, flagging, or suggesting edits. You are tasked with ensuring comments adhere to the community guidelines, which include..."

19 comments

mentos

hibikir 10 months ago

Moderation systems, even with humans at the helm, are adversarial systems where people can, and will, push on what is allowed. An AI moderator that is as good as a human on a per message basis is still going to be played like a fiddle by an adversary that is interested enough.

Many a forum out there has collapsed because the moderators manage to decide something is fine when it keep losing them contributors. The why do we think the AI will do better?

dyauspitr 10 months ago
I think you’re overestimating how much moderation it takes to keep a community whole. HN is dang and a handful of other moderators and things are stable. If you could have AI even approach 90% of that then it will truly solve problems.
- BehindBlueEyes 10 months ago
  
  I have yet to see an LLM reliably push back against anything firmly, so I don't know how this would work if the first time a user says the LLM is wrong, it apologizes for the confusion and flips its script.
  Also, LLM aren't unbiased, all data it trains on is biased one way or another. Ask any HR question and see for yourself how its answers lean to be HR BS that favours employers.

elpocko 10 months ago

They will apply the patterns they've learned from the biased moderator actions in their training data, and the even more reinforced bias from their usual fine-tuning that improved their "safety" and crippled their ability to condone controversial statements.

matthewdgreen 10 months ago

So spin up your own forum and don't moderate it. Or spend some time (un-)finetuning an LLM moderator so you can talk about race or eugenics or whatever "exciting" controversial statements you want to talk about. Who cares.

satvikpendem 10 months ago

Very easy to do an AI prompt injection attack if the AI is reading every one of the forum's comments.

mentos 10 months ago
Can have the AI just flag posts for a human to review in v1? Then as you refine the prompt injection detection can move to have the AI be autonomous?
- satvikpendem 10 months ago
  
  There is no way to get rid of a prompt injection attack. There are always ways to convince the AI to do something else besides flagging a post even if that's its initial instruction.
  
  9 replies →

deadbabe 10 months ago

“Review this comment as if you are an AI clone of the moderator dang from Hackernews and select the appropriate function call to apply.”