Comment by tamnd
1 day ago
It seems this repo only saves one web page?
What I'm implementing here is mirroring a whole website, with all its subpages, so you can browse it all offline. For example, all essays from paulgraham.com.
1 day ago
It seems this repo only saves one web page?
What I'm implementing here is mirroring a whole website, with all its subpages, so you can browse it all offline. For example, all essays from paulgraham.com.
Oh, I see. In that case, feature-wise, it is actually a modern alternative to HTTrack.
I think the misunderstanding stems from the browser's "Save As" reference in the description. It is misleading. You use "Save As" to save a single page, not an entire website.
Also, the description lacks a clear explanation of the project's purpose. It would be helpful to include a sentence explaining that the program downloads an entire website, not just a single page.
Singlefile supports scoped recursive crawls too: https://github.com/gildas-lormeau/single-file-cli#:~:text=an...
I highly recommend reading the singlefile source or https://archiveweb.page/ to see how they handle closed shadow DOMs, cross-origin iframes, websockets, media urls, deduping large assets, etc.
> For example, all essays from paulgraham.com
Not the same thing, but I made a clone of pg’s website which can be used for exactly that: https://github.com/shawwn/pg
https://shawwn.github.io/pg/
If you want to read all essays, just clone the repo and open any of the .html files. Or any of the .page files which generated them.
[flagged]
Um. Whose website are you on right now?
Don't come here to laugh but always great when it happens anyways.