Comment by progbits
9 hours ago
The Sanderson wiki [1] has a time-travel feature where you read a snapshot just before a publication of a book, ensuring no spoilers.
I would like a similar pre-LLM Wikipedia snapshot. Sometimes I would prefer potentially stale or incomplete info rather than have to wade through slop.
The easiest way to get this is probably Kiwix. You can download a ~100GB file containing all of English Wikipedia as of a particular date, then browse it locally offline.
I'm not sure if it's real or not, but the Internet Archive has a listing claiming to be the dump from May 2022: https://archive.org/details/wikipedia_en_all_maxi_2022-05
Alternatively, straight from Wikimedia, those are the dumps I'm using, trivial to parse concurrently and easy format to parse too, multistream-xml in bz2. Latest dump (text only) is from 2026-01-01 and weights 24.1 GB. https://dumps.wikimedia.org/enwiki/20260101/ Also have splits together with indexes, so you can grab few sections you want, if 24GB is too large.
There's a torrent at the linked URL. Trying that right now. (I have a couple of Kiwix dumps of Wikipedia offline already.)
But you can already view the past version of any page on Wikipedia. Go to the page you want to read, click "View history" and select any revision before 2023.
I know but it's not as convenient if you have to keep scrolling through revisions.
Have you personally encountered slop there? I tend to use Wikipedia rabbit holes as a pastime and haven’t really felt a difference.