Comment by PaulHoule
2 years ago
Practically, labels are probabilistic. Different people who are trained on how to label will label most things the same way but will disagree about some things. I know my judgement in the morning might not be the same as the afternoon. If you had a lot of people making judgements you could say that "75% of reviewers think this is a scam".
But "lots of reviewers" could be tough. Look at the "Spider Shield" example: if Spider Shield is going to block 95% of spider images, they're going to have to look at 95% of the content that I see, before I see it. This is a big ask if the people doing the labeling hate spiders! (Someone who values a clean feed might want to have a time-delayed feed)
It seems also that the labels themselves would become a thing for people to argue about, particularly if they get attached at the 50% point of the visibility of a post as opposed the first or last 2%.
Something based on machine learning is a more realistic strategy in 2024. Today anti-spiders could make a pretty good anti-spider model with 5000 or spider images. The tools would look a bit like what Bluesky is offering but instead of attaching public tags to images, you would publish a model. You could use standardized embeddings for images and text and let people publish classical ML models out of a library, I am looking at one of my old recommender models right now, it is 1kb serialized, a better model might be 5kb. Maybe every two years they update the embeddings and you retrain.
No comments yet
Contribute on Hacker News ↗