← Back to context

Comment by jsheard

7 months ago

Too bad the AI scrapers don't care, and are melting Wikipedia's production servers anyway.

https://arstechnica.com/information-technology/2025/04/ai-bo...

I bet someone like Cloudflare could pull the dataset each day and serve up a plain text/Markdown version of Wikipedia for rounding error levels of spend. I just loaded a random Wikipedia page and it had a weight of 1.5MB in all for what I worked out would be about 30KB of Markdown (i.e. 50x less bandwidth).

Of course, the problem then is getting all these scrapers and bots to actually use the alternative, but Wikimedia could potentially redirect suspected clients in that direction..

  • Someone suggested to me to apply a filter that serves .md or txt to bots/ai scrapers instead of the regular website, seems smart if it works but i hate it when i get captchas and this could end up similarly detecting non-bots as bots

    maybe a view full website link loaded on js so bots dont see it idk

    • I would love to see most sites serve me markdown. I'd happily install a browser extension to mask me as a a AI bot scraper if it means I can just get the text without all the noise.

      4 replies →

I wonder if Wikipedias recent switch to client side rendering has hurt their performance too. Serving a prerendered page might have helped this situation. I don't know the details of their new system though.

Tragedy of the commons. And that’s why we can’t have nice things.

Because people are people. And will always prioritize egotism over respect for the common good.

  • no - when fragile resources are abused by one endpoint out of one hundred thousand others, and the abuse is one hundred thousand times greater.. how is that a condemnation of the "ways" of "all people" .. what is justice?