Comment by stebalien

2 days ago

I've actually spent some time debugging why git causes so many issues with the backup software I use (restic).

Ironically, I believe you have it backwards: pack files, git's solution to the "too many tiny files" problem, are the issue here; not the tiny files themselves.

In my experience, incremental backup software works best with many small files that never change. Scanning is usually just a matter of checking modification times and moving on. This isn't fast, but it's fast enough for backups and can be optimized by monitoring for file changes in a long-running daemon.

However, lots of mostly identical files ARE an issue for filesystems as they tend to waste a lot of space. Git solves this issue by packing these small objects into larger pack files, then compressing them.

Unfortunately, it's those pack files that cause issues for backup software: any time git "garbage collects" and creates new pack files, it ends up deleting and creating a bunch of large files filled with what looks like random data (due to compression). Constantly creating/deleting large files filled with random data wreaks havoc on incremental/deduplicating backup systems.

0 comments

stebalien

No comments yet

Contribute on Hacker News ↗