Comment by jsheard

10 months ago

Too bad the AI scrapers don't care, and are melting Wikipedia's production servers anyway.

https://arstechnica.com/information-technology/2025/04/ai-bo...

12 comments

jsheard

I would like companies to start aggressively pushing back against AI scrapers using things like Anubis[0]. If you can't be a good steward of the internet or respectful to other peoples' resources, then people have the right to deny them to you.

[0] https://github.com/TecharoHQ/anubis

petercooper 10 months ago

I bet someone like Cloudflare could pull the dataset each day and serve up a plain text/Markdown version of Wikipedia for rounding error levels of spend. I just loaded a random Wikipedia page and it had a weight of 1.5MB in all for what I worked out would be about 30KB of Markdown (i.e. 50x less bandwidth).

Of course, the problem then is getting all these scrapers and bots to actually use the alternative, but Wikimedia could potentially redirect suspected clients in that direction..

tough 10 months ago
Someone suggested to me to apply a filter that serves .md or txt to bots/ai scrapers instead of the regular website, seems smart if it works but i hate it when i get captchas and this could end up similarly detecting non-bots as bots
maybe a view full website link loaded on js so bots dont see it idk
- 3036e4 10 months ago
  
  I would love to see most sites serve me markdown. I'd happily install a browser extension to mask me as a a AI bot scraper if it means I can just get the text without all the noise.
  
  4 replies →

kristianp 10 months ago

I wonder if Wikipedias recent switch to client side rendering has hurt their performance too. Serving a prerendered page might have helped this situation. I don't know the details of their new system though.

sdoering 10 months ago

Tragedy of the commons. And that’s why we can’t have nice things.

Because people are people. And will always prioritize egotism over respect for the common good.

mistrial9 10 months ago

no - when fragile resources are abused by one endpoint out of one hundred thousand others, and the abuse is one hundred thousand times greater.. how is that a condemnation of the "ways" of "all people" .. what is justice?
yreg 10 months ago

But we have nice things. Wikipedia can deal with it just fine.