← Back to context

Comment by adrian_b

16 hours ago

XFS filesystems do not have a "/lost+found" directory in their normal state.

In the very rare occasions when one has to run "xfs_repair", it will create a "/lost+found" directory, if it is required for recovered files.

After the repair and after investigating whether the recovered files contain useful data or not (and after moving the useful files elsewhere), one should normally delete the "/lost+found" directory, because it is no longer needed.

XFS as implemented in RHEL8+ (the only places i've used it in anger) tends to handle being full very badly, leading to system lockups and blocked tasks necessitating a hard reboot. Worse yet is when it's in this state the journal fills and nothing can be done with the volume.

To recover from this on a volume mounted at boot mandates going to either a live disk, or stopping boot in initramfs and running xfs_repair there, I've fruitlessly attempted to play back the journal on many separate occasions by attempting to mount the filesystem (again causing a lock up due to no space) in that state you have to drop the journal, run xfs_repair and then clean up the detritus from /lost+found (and then the location that caused the disk to fill altogether).

EXT4 has other issues certainly, but at least it reserves blocks for the root user explicitly so the system doesn't stop.

  • Don't know if this is a regression from before, since in the RHEL 5/6 days I used XFS filesystems as my default filesystem on large storage-pools. Since XFS doesn't have a shrink option, I would create filesystems of a few gigabytes in size, and grow them whenever needed. I mostly used them for monthly archiving of uploads to a customer's website, so there would be a YYYY-MM lvm volume with an XFS filesystem, and during the month it would be grown automatically from a cronjob if space got tight. I'm quite sure I must have had a bunch of full filesystems there, and never ran into any crashing issues with full XFS filesystems (though these were not the 'root' filesystem). But even on my current laptop (with debian 12/13) I'm running XFS on all filesystems (besides /boot and /efi), and they report being full often enough without any crashes/reboots.

  • The last time when I have used RHEL was decades ago, so I do not have any idea about what happens there.

    On the other hand, I have been using XFS since 2005 (since when I have transitioned from 32-bit Linux to 64-bit Linux), on a great variety of hardware systems, servers, desktops, mini-PCs, laptops.

    My file systems are typically mostly full and from time to time I had incidents when some job failed by filling completely the file system and no longer having any space left for writing the remainder of the files being written.

    Filling completely the HDD or SSD has never caused any problems. I have always just deleted some files or moved some files to other file systems, and I have continued working. Sometimes I had some downloading in progress, which was halted by the browser because of full disk, and in such cases, after making space, I just resumed the download in the browser.

    So I am puzzled by your experience, but I am not very surprised because in Linux there are many obscure configuration options, so the behavior can vary a lot between distributions (I typically use Gentoo). Perhaps your problems were caused by certain daemons that were continuing to make write attempts in the background, which I do not have.

    The only problems that I have ever encountered in XFS happened only in early XFS, i.e. 2 decades ago, which was extremely sensitive to power failures, despite being a journaled filesystem. In early XFS, after a power failure, some previously open files were erased, even if they had been open only for reading. Because of this, a power failure frequently bricked the system, by erasing "/etc/fstab".

    However, this stupid XFS feature has been corrected many years ago and nowadays power failures normally do not have any effect on XFS, i.e. xfs_repair is normally not needed, even after power failures. That was a bug at the conceptual level, not at the programming level, because the erasure of some files in early XFS was intentional, because it wrongly concluded that they might have been corrupted.

    While early XFS was notorious for its fragility against power failures, at that time none of the competing file systems was significantly better, all were buggy. Around the same time, more than two decades ago, I have seen a lot of other filesystems corrupted by power failures, regardless whether they were Windows NTFS or Linux EXT3 or JFS, despite the fact that all were advertised as being resistant to power failures by being journaled. At that time, only one filesystem was completely impervious to power failures, and it was non-journaled, the FreeBSD UFS with "soft updates" (i.e. with a careful ordering of the disk writes, to maintain a consistent state across power failures).

  • XFS tip:

    If you truncate a file it doesn't update metadata. That's how you can get back space for the journal log to start cleaning up crap without rebooting the box and/or taking services down.

    • My initialisation script creates five 100M files in /root (which is on /) to give something to manually delete for breathing room.