Comment by chongli

3 years ago

Windows struggles horrendously with lots of small files. If you tried the same task on Linux you’d see a large jump in performance. I don’t know if Linux small file access could match SQLite but it would be a lot closer.

That's because Windows' FS offers quite a lot of features that many other FSs don't -- especially SQLite doesn't. You might not need those features, of course.

One core feature is that Windows offers hooks ('filters') that allows other components to put themselves between the client program and the files. This is how virtual filesystems work (like OneDrive, etc.), or how anti-malware works.

When you read from sqlite, then those reads can't come from another server, the objects can't be scanned automatically, etc. Again -- you might not need these features, but it's not that sqlite or ext4 is somehow magically faster; they just made different design choices.

  • The design choices make a bit of a difference but most of the overhead is Defender. When you try to read thousands of files your computer spends most of its time running Defender. Turn it off and the problem goes away.

Yes, Linux does a lot better (haven't tried that specific script, but have done a lot of similar things), but I've gotten speed improvements with similar use of SQLite there too, especially when dealing with a lot of files in the same directory.