Comment by yooogurt
12 hours ago
> if you want to back this file up regularly with something like restic, then you will quickly end up in a world of pain: since new mails are not even appended to the end of the file, each cycle of takeout-then-backup essentially produces a new giant file.
As I'm sure the author is aware, Restic will do hash-based chunking so that similar files can be efficiently be backed up.
How similar are two successive Takeout mboxes?
If the order of messages within an mbox is stable, and new emails are inserted somewhere, the delta update might be tiny.
Even if the order of the mbox's messages are ~random, Restic's delta updates will forego large attachments.
It would be great to see empirical figures here: how large is the incremental backup after after a month's emails. How does that compare for each backup strategy?
The pro of sticking with restic is simplicity, and also avoiding the risk of your tool managing to screw up the data.
This risk isn't so bad if it's a mature tool that canonicalises mboxes (e.g. order them by time), but seems risky for something handrolled.
> As I'm sure the author is aware, Restic will do hash-based chunking so that similar files can be efficiently be backed up.
> Even if the order of the mbox's messages are ~random, Restic's delta updates will forego large attachments.
I forget the exact number, but the rolling hashes for Restic and Borg are tuned to produce chunks sizes on the order of an entire megabyte.
Which means attachment file sizes need to be many megabytes in order for Restic to be much use, since the full chunk has to fall within the attachment. — You'd lose 0.5MB at both ends of each attachment on average, so a 5MB file would only be 80% deduped.
Nothing against Restic, but it's tuned for file-level backup, and I'm sure it wouldn't be as performant if it used chunks that were small enough to pick apart individual e-mails.
I suggested the author check out ZPAQ, which has a user-tunable average fragment size, and is arguably even simpler than Restic.
The ZPAQ file can then itself be efficiently backed up by Restic.