← Back to context

Comment by stuffoverflow

4 days ago

Archiveteam did a full site crawl[1] when Anandtech announced they were stopping. You can browse the warc.gz files like a regular web page using https://replayweb.page

Alternatively you could use solrwayback[2] to index and browse the warc files.

1: https://archive.fart.website/archivebot/viewer/job/202409012...

2: https://github.com/netarchivesuite/solrwayback

Also Kiwix[1] is an excellent app for browsing websites offline. You can use warc2zim[2] to convert the WARC files to ZIM files for use with Kiwix.

I was pleasantly surprised to find that the DWDS (digital dictionary of the German language) app is actually Kiwix!

[1]: https://www.kiwix.org/

[2]: https://github.com/openzim/warc2zim

  • > Kiwix

    ... I haven't heard this name in 15 years probably. Back then you could bring Wikipedia offline on a laptop, it was only around 20-25 GB.

    • I really like having the mobile version for fast searches, often faster than online. Useful for example while hiking or other out -of-network places. Even some big stores have zero signal inside and sometimes I want to look up things. You can also get almost any Stack Exchange site.

      If you live in a low, but not zero, bandwidth environment... since the rise of LLMs it's now cheaper to have the models do your dirty work. Before, you might have to search through pages of results, load MBs of data and still not find the answer. Offloading that to a data center and getting a few hundred kB back is convenient. Coupled with Kiwix and you can do quite a lot with a lousy internet connection.

This is a bit tangential, but is there a good way to archive Discourse forums and turn them into regular websites? Anyone have experience to share?