Comment by nostrademons
3 years ago
Lots of small files (particularly in a single directory) is a known failure mode of many OS filesystems. I remember putting a million files into MogileFS and finding that filesystem operations basically did not complete in any reasonable length of time.
It also seems that some “do one thing well” tools want to implement their logic using files. For example: each entry in a password database is a file (pass(1)).
Whatever overhead of such small files might not matter though if the problem space is kept “human sized” (whatever one human can be bothered to manage).
I used to have tens of thousands of file in git annex. I had to tarball (chunk) some of the things that I never really use in order to speed up `git annex fsck`.
Yes I did an experiment years ago with 6.5 million files and it was disaster (https://breckyunits.com/building-a-treebase-with-6.5-million...).
However, things have totally changed with the M1 generation of Macbooks. Things that were once near impossible now run in an instant. I need to redo this experiment.
When I worked at NetApp, this was a problem there too.
IIRC, the fix was for directory entries were later stored in sorted buckets (aka hash map) and only the corresponding bucket for the file name would get locked. This reduced scope of lock for atomic operations like rename and also allowed faster lookup instead of O(n) based scan.
Yeah I got absolutely crushed by this when trying to migrate a Windows Server 2016 machine to unRAID, whose filesystem is absolutely horrible when dealing with thousands of smaller files. Wiped out a month of work for NAS-related activities; we're back on Windows again.
Small-file performance was the main selling point of (v3) ReiserFS.
ReiserFS was really good. Too bad that nobody continued it. I’m not sure it was due to the original author story or due to it’s code being really hard to work with. But the FS was so much faster than ext3/4 on machines from 2000’. And I never lost a file with it. Something you cannot say about xfs or jfs. I’ve had to reinstall os so many times after system crash due to them getting corrupted.
1 reply →
The murderous File System.