Comment by chasil
3 years ago
In Linux, these files are unlinked, so they are invisible in the filesystem. It is legal to unlink an active file descriptor, but continue reads/writes to it.
I think that lsof can still see these temporary files; I'm not sure how I first noticed it.
Windows implements a POSIX kernel layer, so perhaps this functionality could be coaxed out of it.
I knew two guys long long ago that used to circumvent the per user disk quota on shared machines, which was tiny and not conducive to power users. One found a deep dark corner of the file system, and used it to hold binaries the rest of us used. I think his idea was that if it wasn’t for personal gain it was easier to answer difficult questions, which at some point came up and he got a pass, since it would take more disk space if we had private copies.
The other was keeping file handles open to deleted files, as you describe. I don’t recall how this worked, but I suspect it involved uncompressing data into a file descriptor, then reading it back. I guess as long as his terminal window was open it was more stable than tmp (he may have also been using screen).
> I don’t recall how this worked
Unlinking files reduces their reference count. Once the reference count reaches zero the file is considered deleted and the space can be reclaimed. Every open file descriptor and hard link increases a file's reference count. So if you've got a file descriptor open on a file in a background task (daemon, nohup, screen, etc) and unlink it with rm or something the file's reference count will decrement but not go to zero. Only when that program closes the file descriptor will the reference count go to zero and the file actually be deleted.
I had an interesting issue with that the other day:
Someone on my company thought it good to launch background process from within a cron, and start it with `&ˋ at the end, so that it detached from the cron and goes as child to init .
Right, but the program opens std out and dumps its output in it. Guess what, the fd handling the std out ends up in a temp file which is deleted by cron, but still held by the program.
I ended up with a full / in all production servers, with no culprit when running ˋdu`. Only running ˋlsof |grep deleted` did I find these huge temporary files.
Killing the process and switching to a saner systemd service was the savior.
If a file is opened with `FILE_SHARE_DELETE` then it can be deleted while held open. This flag has existed since forever. It's just that unlike Linux this isn't the default, and no developer would think of setting it. I think the only common software I saw using it when I still used Windows was the media player mpv.
When youtube was flash based that is how flash would save it's cache file for the stream. to save a video you would go looking for what file descriptors were in use and yank them out of I believe /dev/fd and back into the filesystem. On windows the file was visible but locked behind the normal windows file locking shenanigans. To copy locked files on windows I used a program called hobocopy that utilized volume shadow tricks to work around the locks.
Apparently, recent Windows uses POSIX semantics by default: https://news.ycombinator.com/item?id=23745019
lsof shows “open file handles” which can include deleted files. Can be useful when you have a disk showing tons of usage but you can’t find any big files.