← Back to context

Comment by PaulHoule

20 hours ago

One of the projects on my agenda is a classifier that detects those people on social media by detecting "signs of hostility." This was hung up for a while because I thought the process of making a training set would kill me [1] (not seeing these people was a major motivation for the project) but now I'm more optimistic. I still gotta make a generic ModernBERT + LSTM + calibration classifier though.

[1] https://www.cnn.com/2024/12/22/business/facebook-content-mod...

We had a very naive version of this at a company I worked for about 25 years ago. It was called “asshole detective”. We captured about 200 user comments and dredged through them by hand and scored particular words and phrases. Then we summed up the scores of each post in a thread. If a user was more than a couple of standard deviations outside the mean it’d flag them as an asshole. After reviewing this over a few weeks we found it was surprisingly good at singling out persistent assholes. It did however never action anything - that was up to a moderator to do.

I imagine it’d be good at getting rid of a lot of modern plagues on social media as they seem to have a small, predictable and shitty vocabulary.

  • There's a lot of people that are condescending to others, but they wouldn't see themselves as being an asshole. I see this often in Ham Radio and Electronics.

    Their responses are curt, sure, but to them they are not outside the norm of the field.

    • I’m a licensed ham as well. These folk were even far outside the realm of the local racists and wife haters on 2m where I am.

      (One reason I stick to CW - being an asshole on there is too time consuming)

      6 replies →

  • That's roughly what I'm planning. There are certain keywords and other signs (last time I looked 40,000 Bluesky users reposted and pinned a certain 'skeet') that I would say are "hostile" and with those I can seed a list of candidates of hostile/non-hostile people and then use active learning methods to expand and clean up the list.

    ... what I really need is a something that detects 'text in images', i mean, I don't mind if you took a photo of a sign in the real world but posting screenshots is a bad smell, only a tiny fraction are wholesome like this:

    https://bsky.app/profile/up-8.bsky.social/post/3lseycg7nl22p

I wish you the best of luck, but these days the main problems you're going to be facing are political, not technical. What makes people start to display "signs of hostility" these days is almost always tribal politics, and when you ban that, you are (at least from their POV), engaging in politically-motivated censorship. If it gets any kind of traction or visibility, your tool will be pinpointed as a weapon of The Enemy for suppressing truth and entrenching the powers that be, and you'll start getting threats to match.

Not to say you shouldn't do it, but you should be aware of what you're signing up for.