Comment by HorstG

6 years ago

ZFS does overprovision all filesystems in a zpool by default. Create 10 new filesystems and 'df' will now display 10x the space of the parent fs. A full fs is handled differently than your volume manager running out of blocks. But the normal case is overprovisioning.

That's not really overprovisioning. That's just a factor of the space belonging to a zpool, but 'df' not really having a sensible way of representing that.

That is not over-provisioning, it's just that 'df' doesn't have the concept of pooled storage. With pools it's possible for different file systems to share their "available" space. BTRFS also has its own problems with ouput when using df and getting strange results.

If I have a 10GB pool and I create 10 empty file systems, the sizes reported in df will be 100GB. It's not quite a lie either, because each of those 10 file systems does in fact have 10GB of space available I could write 10GB to any one of them. If I write 1GB to one of those file systems, the "size" and "available" spaces for the other nine will all shrink despite not having a single byte of data written to them.

With ZFS and df the "size" column is really only measuring the maximum possible size (at this point in time, assuming nothing else is written) so it isn't very meaningful, but the "used" and "available" columns do measure something useful.

  • This is exactly what overprovisioning is: The sum of possible future allocations is greater than available space.

    • In my example the sum of possible future allocations for ZFS is still only 10GB total. Each of the ten file systems, considered individually, does truthfully have 10GB available to it before any data is written. The difference is that with over-provisioning (like LVM+XFS), if I write 10GB of data to one file system the others will still report 10GB of free space, but with ZFS or BTRFS they'll report 0GB available, so I can never actually attempt to allocate 100GB of data.

      You could build a pool-aware version of DF that reflects this, by grouping file systems in a pool together and reporting that the pool has 10GB available. But frankly there's not enough benefit to doing that because people with storage pools already understand summing up all the available space from df's output is not meaningful. Tools like zpool list and BTRFS's df equivalent already correctly report the total free space in the pool.