Comment by baq

2 years ago

I need labels on labels and labels on labellers. I also need labellers for labellers. With that, I can create a network of labellers which can keep each other honest with enough distribution; think DNS root servers but which constantly check if every other root server is still reasonably trustworthy to be authoritative.

Then I need users who (hopefully) vote on/rate/report labels, which is its own problem.

3 comments

baq

PaulHoule 2 years ago

Practically, labels are probabilistic. Different people who are trained on how to label will label most things the same way but will disagree about some things. I know my judgement in the morning might not be the same as the afternoon. If you had a lot of people making judgements you could say that "75% of reviewers think this is a scam".

But "lots of reviewers" could be tough. Look at the "Spider Shield" example: if Spider Shield is going to block 95% of spider images, they're going to have to look at 95% of the content that I see, before I see it. This is a big ask if the people doing the labeling hate spiders! (Someone who values a clean feed might want to have a time-delayed feed)

It seems also that the labels themselves would become a thing for people to argue about, particularly if they get attached at the 50% point of the visibility of a post as opposed the first or last 2%.

Something based on machine learning is a more realistic strategy in 2024. Today anti-spiders could make a pretty good anti-spider model with 5000 or spider images. The tools would look a bit like what Bluesky is offering but instead of attaching public tags to images, you would publish a model. You could use standardized embeddings for images and text and let people publish classical ML models out of a library, I am looking at one of my old recommender models right now, it is 1kb serialized, a better model might be 5kb. Maybe every two years they update the embeddings and you retrain.

wmf 2 years ago

You sure do "need" a lot of things.

baq 2 years ago

Yes.
We had all this on slashdot before quite a few folks here were born.