Comment by peter_d_sherman
5 years ago
>"Closing File Handles on Windows
Many years ago I was profiling Mercurial to help improve the working directory checkout speed on Windows, as users were observing that checkout times on Windows were much slower than on Linux, even on the same machine.
I thought I could chalk this up to NTFS versus Linux filesystems or general kernel/OS level efficiency differences. What I actually learned was much more surprising.
When I started profiling Mercurial on Windows, I observed that most I/O APIs were completing in a few dozen microseconds, maybe a single millisecond or two ever now and then. Windows/NTFS performance seemed great!
Except for CloseHandle(). These calls were often taking 1-10+ milliseconds to complete. It seemed odd to me that file writes - even sustained file writes that were sufficient to blow past any write buffering capacity - were fast but closes slow. It was even more perplexing that CloseHandle() was slow even if you were using completion ports (i.e. async I/O). This behavior for completion ports was counter to what the MSDN documentation said should happen (the function should return immediately and its status can be retrieved later).
While I didn't realize it at the time, the cause for this was/is Windows Defender. Windows Defender (and other anti-virus / scanning software) typically work on Windows by installing what's called a filesystem filter driver. This is a kernel driver that essentially hooks itself into the kernel and receives callbacks on I/O and filesystem events. It turns out the close file callback triggers scanning of written data. And this scanning appears to occur synchronously, blocking CloseHandle() from returning. This adds milliseconds of overhead."
PDS: Observation: In an OS, if I/O (or more generally, API calls) are initially written to run and return quickly -- this doesn't mean that they won't degrade (for whatever reason), as the OS expands and/or underlying hardware changes, over time...
For any OS writer, present or future, a key aspect of OS development is writing I/O (and API) performance tests, running them regularly, and immediately halting development to understand/fix the root cause -- if and when performance anomalies are detected... in large software systems, in large codebases, it's usually much harder to gain back performance several versions after performance has been lost (i.e., Browsers), than to be disciplined, constantly test performance, and halt development (and understand/fix the root cause) the instant any performance anomaly is detected...
Related, If you copy a file via the OS's copy function the system knows the file was scanned and you get fast copies. If you copy the file by opening a new destination file for write, opening the source file for read, and copying bytes, then of course you trigger the virus scanner.
So for example I was using a build system and part of my build needed to copy ~5000 files of assets to the "out" folder. It was taking 5 seconds on other OSes and 2 minutes on Windows. Turned out the build system was copying using the "make a new file and copy bytes" approach instead of calling the their language's library copy function, which, at least on Windows, calls the OS copyfile function. I filed a bug and submitted a PR. Unfortunately while they acknowledged the issue they did not take the PR nor fix it on their side. My guess is they don't really care about devs that use Windows.
Note that python's copyfile does this wrong on MacOS. It also uses the open, read bytes, write bytes to new file method instead of calling into the OS. While it doesn't have the virus scanning issue (yet) it does mean files aren't actually "copied" so metadata is lost.
> Note that python's copyfile does this wrong on MacOS. It also uses the open, read bytes, write bytes to new file method instead of calling into the OS.
It doesn't, since 3.8. It tries fcopyfile() and only if it fails, does the read/write dance.
See: https://github.com/python/cpython/blob/master/Lib/shutil.py#...
I tested in 3.8, didn't seem to work
https://bugs.python.org/issue38906
1 reply →
> "For any OS writer, present or future, a key aspect of OS development is writing I/O (and API) performance tests, running them regularly, and immediately halting development to understand/fix the root cause -- if and when performance anomalies are detected... in large software systems, in large codebases, it's usually much harder to gain back performance several versions after performance has been lost (i.e., Browsers), than to be disciplined, constantly test performance, and halt development (and understand/fix the root cause) the instant any performance anomaly is detected..."
Yes, this! And not just OS writers, but authors of any kind of software. Performance is like a living thing; vigilance is required.
I've had the displeasure of using machines with Mcaffe software that installed a filesystem driver. It made the machine completely unusable for development and I'm shocked Microsoft thought making that the default configuration was reasonable.
Copying or moving a folder that contained a .git folder resulted in a very large number of small files being created. To this day, I'm not sure if it was the antivirus, the backup software, or Windows' built-in indexing, but the computer would become unusable for about 5 minutes whenever I would move folders around. It was always Windows Explorer and System Interrupts taking up a huge amount of CPU, and holy cow was it annoying.
Even worse than that, moving a lot of small files in WSL reliably BSODs my work machine due to some sort of interaction with the mandated antivirus on-access scanning, making WSL totally unusable for development on that machine.
Good talk about debugging i/o in rustup:
https://youtube.com/watch?v=qbKGw8MQ0i8
Perhaps disable Windows Defender for the database (or whatever) folder