Comment by troupo
19 days ago
Oh, I was a bit off. They also indexed diffs
> And I mean that - they indexed every single diff on every page for every change ever made. Frequently with spikes of more than 10req/s. Of course, this made MediaWiki and my database server very unhappy, causing load spikes, and effective downtime/slowness for the human users.
Does MW not store diffs as diffs (I'd think it would for storage efficiency)? That shouldn't really require much computation. Did diffs take 30s+ to render 15-20 years ago?
For what it's worth my kiwix copy of Wikipedia has a ~5ms response time for an uncached article according to Firefox. If I hit a single URL with wrk (so some caching at least with disks. Don't know what else kiwix might do) at concurrency 8, it does 13k rps on my n305 with a 500 us average response time. That's over 20Gbit/s, so basically impossible to actually saturate. If I load test from another computer it uses ~0.2 cores to max out 1Gbit/s. Different code bases and presumably kiwix is a bit more static, but at least provides a little context to compare with for orders of magnitude. A 3 OOM difference seems pretty extreme.
Incidentally, local copies of things are pretty great. It really makes you notice how slow the web is when links open in like 1 frame.
> Different code bases
Indeed ;)
> If I hit a single URL with wrk
But the bots aren't hitting a single URL
As for the diffs...
According to MediaWiki it gzips diffs [1]. So to render a previous version of the page I guess it'd have to unzip and apply all diffs in sequence to render the final version of the page.
And then it depends on how efficient the queries are at fetching etc.
[1] https://www.mediawiki.org/wiki/Manual:MediaWiki_architecture