Comment by m4rtink
6 years ago
Isn't this a problem for any over provisioned storage pool ? You can avoid that if you want by not over provisioning & checking space consumed by CoW snapshots. Also what does ZFS do if you run out of blocks ?
I have actually managed to run out of blocks on XFS on thin LV and it's an interesting experience. XFS always survoved just fine, but some files basically vanished. Looks like mostly those that were open and being written to at exhaustion time, like for example a mariadb database backing store. Files that were just sitting there were perfectly fine as far as I could tell.
Still, you definitely should never put data on a volume where a pool can be exhausted, without a backup as I don't think there is really a bulletproof way for a filesystem to handle that happening suddenly.
>Isn't this a problem for any over provisioned storage pool ?
ZFS doesn't over-provision anything by default. The only case I'm aware of where you can over-provision with ZFS is when you explicitly choose to thin provision zvols (virtual block devices with a fixed size). This can't be done with regular file systems which grow as needed, though you can reserve space for them.
File systems do handle running out of space (for a loose definition of handle) but they never expect the underlying block device to run out of space, which is what happens with over-provisioning. That's a problem common to any volume manager that allows you to over provision.
Can't you over provision even just by creating too many many snapshots ? Even if you never make the filesystems bigger then the backing pool, the snapshots will allocate some blocks from the pool and over time, boom.
Snapshots can't cause over-provisioning, not for file systems. If I mutate my data and keep snapshots forever, eventually my pool will run out of free space. But that's not a problem of over-provisioning, that's just running out of space.
With ZFS, if I take a snapshot and then delete 10GB of data my file system will appear to have shrunk by 10GB. If I compare the output of df before and after deleting the data, df will tell me that "size" and "used" have decreased by 10GB while "available" remained constant. Once the snapshot is deleted that 10GB will be made available again and the "size" and "available" columns in df will increase. It avoids over-provisioning by never promising more available space than it can guarantee you're able to write.
I think you're trying to relate ZFS too much to how LVM works, where LVM is just a volume manager that exposes virtual devices. The analogue to thin provisioned LVM volumes is thin-provisioned zvols, not regular ZFS file systems. I can choose to use ZFS in place of LVM as a volume manager with XFS as my file system. Over-provisioned zvols+XFS will have functionally equivalent problems as over-provisioned LVM+XFS.
ZFS doesn't work this way. The free blocks in the ZFS pool are available to all datasets (filesystems). The datasets themselves don't take up any space up front until you add data to them. Snapshots don't take up any space initially. They only take up space when the original dataset is modified, and altered blocks are moved onto a "deadlist". Since the modification allocates new blocks, if the pool runs out of space it will simply return ENOSPC at some point. There's no possibility of over-provisioning.
ZFS has quotas and reservations. The former is a maximum allocation for a dataset. The latter is a minimum guaranteed allocation. Neither actually allocate blocks from the pool. These don't relate in any comparable way to how LVM works. They are just numbers to check when allocating blocks.
LVM thin pools had (maybe still have - I haven't used them recently) another issue though, where running out of metadata space caused the volumes in the thinpool to become corrupt and unreadable.
ZFS does overprovision all filesystems in a zpool by default. Create 10 new filesystems and 'df' will now display 10x the space of the parent fs. A full fs is handled differently than your volume manager running out of blocks. But the normal case is overprovisioning.
That's not really overprovisioning. That's just a factor of the space belonging to a zpool, but 'df' not really having a sensible way of representing that.
That is not over-provisioning, it's just that 'df' doesn't have the concept of pooled storage. With pools it's possible for different file systems to share their "available" space. BTRFS also has its own problems with ouput when using df and getting strange results.
If I have a 10GB pool and I create 10 empty file systems, the sizes reported in df will be 100GB. It's not quite a lie either, because each of those 10 file systems does in fact have 10GB of space available I could write 10GB to any one of them. If I write 1GB to one of those file systems, the "size" and "available" spaces for the other nine will all shrink despite not having a single byte of data written to them.
With ZFS and df the "size" column is really only measuring the maximum possible size (at this point in time, assuming nothing else is written) so it isn't very meaningful, but the "used" and "available" columns do measure something useful.
2 replies →