Comment by ori_b
18 days ago
Imagine a race condition that writes a file node where a directory node should be. You have a valid object with a valid checksum, but it's hooked into the wrong place in your data structure.
18 days ago
Imagine a race condition that writes a file node where a directory node should be. You have a valid object with a valid checksum, but it's hooked into the wrong place in your data structure.
> Imagine a race condition that writes a file node where a directory node should be. You have a valid object with a valid checksum, but it's hooked into the wrong place in your data structure.
A few things: 1) Is this an actual ZFS issue you encountered or is this a hypothetical? 2) And -- you don't imagine this would be discovered during a scrub? Why not? 3) But -- you do imagine it would be discovered and repaired by an fsck instead? Why so? 4) If so, wouldn't this just be a bug, like a fsck, not some fundamental limitation of the system?
FWIW I've never seen anything like this. I have seen Linux plus a flaky ALPM implementation drop reads and writes. I have seen ZFS notice at the very same moment when the power dropped via errors in `zpool status`. I do wonder if ext4's fsck or XFS's fsck does the same when someone who didn't know any better (like me!) sets the power management policy to "min_power" or "med_power_with_dipm".
Here's an example: https://www.illumos.org/issues/17734. But it would not be discovered by a scrub because the hashes are valid. Scrubs check hashes, not structure. It would be discovered by a fsck because the structure is invalid. Fscks check structure, not hashes.
They are two different tools, with two different uses.
> Scrubs check hashes, not structure.
How is the structure not valid here? Can you explain to us how an fsck would discover this bug (show an example where an fsck fixed a similar bug) but ZFS could never? The point I take contention with is that missing an fsck is a problem for ZFS, so more specifically can you answer my 4th Q:
>> 4) If so, wouldn't this just be a bug, like (a bug in) fsck, not some fundamental limitation of the system?
So -- is it possible an fsck might discover an inconsistency ZFS couldn't? Sure. Would this be a fundamental flaw of ZFS, which requires an fsck, instead of merely a bug? I'm less sure.
You do seem to at least understand my general contention with the parent's point. However, the parent is also making a specific claim about a bug which would be extraordinary. Parent's claim is this is a bug which a scrub, which is just a read, wouldn't see, but a subsequent read would reveal.
So -- is it possible an fsck might discover this specific kind of extraordinary bug in ZFS, after a scrub had already read back the data? Of that I'm highly dubious.
1 reply →