Comment by hungryhobbit
7 hours ago
They're a wiki. Wiki spammers are relentless now.
Source: a small wiki I help manage, for an obscure game with <10k players, recently had to disable new signups, because the spam was so bad (and it was stuck on an old version of MediaWiki, which didn't have CAPTCHA-support).
On a popular wiki, and it sounds like this one was fairly popular, I imagine even CAPTCHA's won't be enough to stop wiki spammers. If those spammers were posting more than just "buy my penis pill" garbage (e.g. they were putting links to malware sites), Google probably, and somewhat legitimately, saw them as a source of such malware.
I imagine the fix for the OP is a thorough audit/cleansing of all malicious content on the wiki, followed by some sort of appeal to Google (which will no doubt take months, if they even respond at all, because ... Google).
Really OP's only hope is that the Google team responsible for this has an Italian Pokemon fan; otherwise they are probably screwed.
We have very good anti-bot system set up with a good number of Cloudflare fine-tuned rules, limited permissions for newly created accounts, and a very dedicated team of volunteers that patrol the recent edits constantly. I cannot exclude that somewhere on a rarely visited page (out of 37k+) there is a spam link, but I doubt it’s the reason for the deindexing. I think this would also appear on the Google Search Console.
I'd still recommend doing searches for common spam topics to see if you have "bad" stuff. On our wiki everything looked fine until you searched for (say) "finance" (which most users never would) ... and then you'd find a mess of spam finance stuff.
As for whether it's responsible or not, obviously I don't know. What I do know is that, without all the info, "Google saw malicious content on your wiki" is a far more logical theory than "Google just decided to hate us out of the blue".
“I was having a hard time until I found a great investment advisor.”
“How can I contact your advisor?”
“Their name is <three part unique name>; just search for them and reach out.”
“Great. I found them and their results look impressive. I reached out and hope they get back to me soon.”
I saw a comment on here a few days ago and the user mentioned that they use a Captcha AI bot in their day to day life because a solve costs $.003. So even if you had the captcha-enabled new version it might not have helped!
If your project is popular enough to the point where tailored automation make sense there no way to fight spam really.
If its small enough you can usually avoid all the spam bots by adding any none-standard flow in registration procedure. E.g static picture or audio of something only your audience know with like drop down option to click on picture saying "I'm not a bot". Or add one more email verification for first post or edits. Or make users watch large YouTube video at certain timespamt with correct answer, etc. Anything non-standard works.
Breaks 99.9% of automation and SERP spammers wont bother create unique one for your wiki / forum / etc.
If your site is very popular you're fckd obviously and it's just arm race. This is where you can use Hashcash or something that will burn lots of CPU / GPU / RAM / etc single time so spammers will just blacklist you.
Captcha does nothing against the spammers. I have found that blocking email domains from signups works pretty well. My list is at https://www.rejectionwiki.com/index.php?title=MediaWiki:Emai... (this is a built-in feature of Media Wiki and should work ok with most versions)
Do you have any basis for saying that this wiki is overrun with spam, or are you just hand-waving? They were explicit in their Twitter thread about not being full of AI slop, and that they checked their list of pages that were marked as 'crawled but not indexed' and found no abuse.
I understand that you were taken aback by spam attacks on the wiki you help manage, but it's not reasonable to generalize from yours to theirs.
As I said above:
>As for whether it's responsible or not, obviously I don't know. What I do know is that, without all the info, "Google saw malicious content on your wiki" is a far more logical theory than "Google just decided to hate us out of the blue".
AI slop wouldn’t be on my top ten list of annoying wiki spam, having been the one dealing with such things in the past. You can be free of slop and still overflowing with spam.
Weird Gloop (wiki host, started with runescape but now has dozens) has blogged about this https://weirdgloop.org/blog/clankers
How old a version? I've been running a much more obscure game (<150 players, down from ~1k in 2010) for some time, and it was using QuestyCaptcha back in...2008 or so, I think? Certainly at least 15 years ago. It's almost always been sufficient: just put in a couple of questions based on knowledge of the game itself.
Also running a wiki. Similar. Had a sign-up based on in-game knowledge. LLMs now crack it and I had to turn off signups about a year ago. Now people email me directly if they want an account.
An organization I'm involved with has had to add Anubis (https://github.com/TecharoHQ/anubis) because of the recent wiki attacks from LLM scrapers. It's finally fixed our outages.
Social sites should have all have a tree-based invite system. This would allow wiping out spammers and their enablers in a single hit. It would allow vetting of good actors too.
I feel like the dream solution is more like tree-based content: you see content that is vouched for by people you vouch for; if someone's account is compromised then their vouches get updated to not matter anymore, cutting their whole tree off at the root to make it invisible. Spammers should end up in largely disconnected components of the trees.
Pretty much what xwitter / bsky is on the following page. The algorithm layer atop twitter was pretty good connecting me with people/content before daddy elon came along. And this algorithm layer is actually needed (in my view) to make the social network thing work, otherwise there is no critical mass
Thankfully bsky is not that good, so I don't get hooked by it at all. But i miss it
How does new content or content from new accounts get seen by anyone?
5 replies →
I don't think so. "sleeper" accounts are a thing. A more sophisticated spammer could create a "high-reputation" account over some time that only posts useful info, then turn up the spam after the trust level is high enough - or even turn the tree system into a business opportunity and sell vouches to other spammers.
It doesn't solve as much as it sounds.
- You can't vouch for downstream invites, so the tree aspect isn't useful.
- It's not your fault if someone's account gets taken over by a spammer.
- Just because you vouched for someone once doesn't mean you vouch for them in the future.
- What should the punishment be if you accidentally invite a bad actor?
- Your community has to be large and desirable enough for people to bother. The vast majority of sites will die before anyone cares about jumping through hoops.
Addressing issues like these ends up kinda defeating the ideals of the proposal and regresses it into a mechanic that simply makes it harder to register. Which might be useful wrt anti-spam, but it has its own issues, like people having to constantly grovel for invites, shutting out earnest contributors, etc.
In the "real" world we keep our relationship bubbles to a small number for just this reason - we have our close relationships, the people we trust, and then we have gradually less and less close relationships - people we know to greet, or know have done x or y or z
We know that people who are on our outer orbits should not be shared with in certain ways.
Our communities are in fact lots of overlapping bubbles, x knows y, and y knows z, but x sees them as a stranger.
The internet changes that dynamic, and we don't yet know how to manage it - we cannot all live in the town square, and we know to be very careful there, pickpockets, thieves, and robbers abound - and we have no idea who is who
Again our historical approach has been places like universities, where we have "trusted" advisors (teachers) who guide us on subjects being discussed openly - but who also ensure that we avoid pitfalls, like heated debates where people abandon logic and instead use rhetoric or violence, and who ensure that commercial interests are managed - that is, some advertisements are allowed, but unauthorised advertising is forbidden
That approach (moderation) has its own set of problems
You still need criteria to handle reputation: does an account invited years ago and now spamming affects the reputation of the inviter, how much? What about the hacked accounts?
For small platforms it makes a lot of sense, for larger the potential for abuse is still there in different forms.
Now you just created a market for farmed "legit" accounts.
Yes, but the site operator can significantly increase the market price for such an account. This makes spamming more expensive.
Blue tick accounts :)
That’s literally how Facebook started
I remember begging my older step brother for an invite since he had the college email to get in
Then it’s just hacked account whack-a-mole and deciding who legitimately got their account hacked and who is lying.
It raises the bar at least somewhat though!
Interesting to compare this site and lobste.rs for that
Both from safety and volume perspectives, I’d imagine. Openness has value.
1 reply →
Yes Lobste.rs is great but much more limited in conversational scope. I don’t think the content on each site is directly comparable. The sites are not equal in audience and intention. For example, Lobste.rs doesn’t allow rampant evangelism or want to attract start-ups and thus doesn’t attract a more “spammy” crowd.
Lobste.rs has an invite based system however.
[dead]