Comment by mike_hearn
5 days ago
Well, I worked on abuse for a few years and gave a tech talk on it at Twitter. You really do want these things to be fully automated for two reasons.
The first is that spammers automate so if you don't then they're always much faster. By the time your humans are paying attention and have made a decision it's too late, all the spam (scams, frauds, malware sites...) was already delivered. In the next spam run everything will have changed, so, decisions made by that point are useless.
The second is that your suggestion contains an unstated premise that the human evaluators would have somehow more information to work with than an automated system, or would reach a different decision. In reality they don't and wouldn't. URL reputation systems like this are triggered by spam attacks. For a certain window of time there's a high probability that any message containing a specific domain name will be flagged by users as spam, so the system short-circuits that and starts classifying all messages of unknown status containing a link to that domain as spam. This works well because spammers usually want their targets to visit a website.
So the human evaluators in this case will see a message like:
"URL domain signal.me has 67% chance of spam and rising, confirm block? Y/N"
and the humans will always press Y because obviously (a) it means that such a decision is right more often than it's wrong and (b) the domain name is normally meaningless anyway. The block will be removed a bit later once the spammers go away.
In this case, it's tempting to think that some human in a cheap labor country would somehow see this message and think "ah! signal.me! clearly a domain linked to the super cool Signal messenger, which I personally like, so I won't block it even though this might cause a lot of people to be victimized by criminals". But they wouldn't and shouldn't. The domain even looks phishy, it's quite surprising to learn that it's a real Signal linked domain.
I don't think you should block a major domain because of a spam spike. If you think the entire domain is compromised, then send in a human to look, otherwise focus on the accounts that are sending spam links.
signal.me probably isn't a major domain. These things work on probability. I use Signal (most people don't) and had never heard of this domain until now. It's likely a utility domain that doesn't appear in X messages often, so a single spammer is capable of overwhelming its good traffic in a single campaign. Whereas e.g. google.com has a lot of presence over a long period of time in non-spam messages so it'd take a huge spam campaign to overpower it.
Meanwhile thousands of users are receiving links to malware and phishing attacks while human raters are twiddling their thumbs.
The parent is correct, these systems should be automated. Automated systems can respond faster and more accurately than humans. Humans should only be involved to improve the system and correct any mistakes it makes after the fact.
Meanwhile thousands of users that depend on their links working see them still working.
If you want to prevent phishing links then you can't allow links in the first place. If you decide to allow links, then you need to add some stability to the system. Circumstantial reactions should not be fast. Banning a whole domain based on short term percentage is circumstantial.
Social media platforms shouldn't play internet police in the first place. We need to preserve our ability to communicate via impartial carries. I don't want the mailman to decide what letters I can send. I do not want the telephone company to decide what calls I can make. I don't want online replacements of them to make any such decisions either.
The problem isn't links to scam sites posted on X/whatever, the problem is the scam. That is something for the actual police, government agencies and ultimately legislators to handle. Go after the actual scammers. Go after countries harboring them. Don't sacrifice our freedoms for an "easy" out instead of doing real law enforcement.
3 replies →
The trick I imagine is to categorize users by how established they are then block links by domain and user level. Sometimes spammers will mature accounts first but it is considerably more work. Creating a lot of fresh accounts is much easier.