Comment by nunobrito
10 days ago
In case the author is reading: Please consider to add official fields for an optional screenshot of the page in BASE64 encoding and permit to add an (optional) description. Would also help to have official fields to specify the ISO time stamp when the archival took place.
As final wish list, would be great to have multiple versions/crawls of the same URL with deduplication of static assets (images, fonts) but this is likely stretching too much for this format.
Allowing more metadata might be useful. You can add anything to the manifest at build time as assets are not required to be loaded or ever used (because this is impossible to statically check). I suppose we'd have to define an official prefix like 'gwtar-metadata-*' with like a 'gwtar-metadata-screenshot' and 'gwtar-metadata-desciption'... Not obvious what the best way forward is there, you don't want to add a whole bunch of ad hoc metadata fields, everyone will have a different one they want. Exif...?
Multiple versions or multiple pages (maybe they can be the same thing?) would be nice but also unclear how to make that. An iframe wrapper?
I considered and rejected deduplication and compression. Those can be done by the filesystem/server transparent to the format. (If there's an image file duplicated across multiple pages, then it should be trivial for any filesystem or server to detect or compress those away.)
If possible, I'd ask for a shorter tag name to keep it more readable. For example: "gwtar-screenshot" and "gwtar-description" would work. I've just asked to make it official because otherwise is difficult to get different parsers to agree in the future.
> An iframe wrapper?
The way Archive.org does this navigation between multiple versions is quite pleasant to use. Don't know for sure but might be an iframe added on top.