Comment by ndriscoll

20 days ago

The original post says it's not actually a burden though; they just don't like it.

If something is so heavy that 2 requests/second matters, it would've been completely infeasible in say 2005 (e.g. a low power n100 is ~20x faster than the athlon xp 3200+ I used back then. An i5-12600 is almost 100x faster. Storage is >1000x faster now). Or has mediawiki been getting less efficient over the years to keep up with more powerful hardware?

Oh, I was a bit off. They also indexed diffs

> And I mean that - they indexed every single diff on every page for every change ever made. Frequently with spikes of more than 10req/s. Of course, this made MediaWiki and my database server very unhappy, causing load spikes, and effective downtime/slowness for the human users.

  • Does MW not store diffs as diffs (I'd think it would for storage efficiency)? That shouldn't really require much computation. Did diffs take 30s+ to render 15-20 years ago?

    For what it's worth my kiwix copy of Wikipedia has a ~5ms response time for an uncached article according to Firefox. If I hit a single URL with wrk (so some caching at least with disks. Don't know what else kiwix might do) at concurrency 8, it does 13k rps on my n305 with a 500 us average response time. That's over 20Gbit/s, so basically impossible to actually saturate. If I load test from another computer it uses ~0.2 cores to max out 1Gbit/s. Different code bases and presumably kiwix is a bit more static, but at least provides a little context to compare with for orders of magnitude. A 3 OOM difference seems pretty extreme.

    Incidentally, local copies of things are pretty great. It really makes you notice how slow the web is when links open in like 1 frame.

    • > Different code bases

      Indeed ;)

      > If I hit a single URL with wrk

      But the bots aren't hitting a single URL

      As for the diffs...

      According to MediaWiki it gzips diffs [1]. So to render a previous version of the page I guess it'd have to unzip and apply all diffs in sequence to render the final version of the page.

      And then it depends on how efficient the queries are at fetching etc.

      [1] https://www.mediawiki.org/wiki/Manual:MediaWiki_architecture