Comment by stuffoverflow
4 days ago
Archiveteam did a full site crawl[1] when Anandtech announced they were stopping. You can browse the warc.gz files like a regular web page using https://replayweb.page
Alternatively you could use solrwayback[2] to index and browse the warc files.
1: https://archive.fart.website/archivebot/viewer/job/202409012...
Also Kiwix[1] is an excellent app for browsing websites offline. You can use warc2zim[2] to convert the WARC files to ZIM files for use with Kiwix.
I was pleasantly surprised to find that the DWDS (digital dictionary of the German language) app is actually Kiwix!
[1]: https://www.kiwix.org/
[2]: https://github.com/openzim/warc2zim
> Kiwix
... I haven't heard this name in 15 years probably. Back then you could bring Wikipedia offline on a laptop, it was only around 20-25 GB.
You can still bring Wikipedia offline on a laptop (and mobile phone, for some of the larger ones), it is just that you'd need around 100GB instead. There is even a library[0] you can use to do your own wikipedia viewer.
[0] https://github.com/openzim/libzim
1 reply →
I really like having the mobile version for fast searches, often faster than online. Useful for example while hiking or other out -of-network places. Even some big stores have zero signal inside and sometimes I want to look up things. You can also get almost any Stack Exchange site.
If you live in a low, but not zero, bandwidth environment... since the rise of LLMs it's now cheaper to have the models do your dirty work. Before, you might have to search through pages of results, load MBs of data and still not find the answer. Offloading that to a data center and getting a few hundred kB back is convenient. Coupled with Kiwix and you can do quite a lot with a lousy internet connection.
This is a bit tangential, but is there a good way to archive Discourse forums and turn them into regular websites? Anyone have experience to share?