← Back to context

Comment by throw0101c

2 days ago

> Or you can spend half a billion dollars to solve the issue in hardware.

And hope that your hardware/firmware doesn't ever get bugs.

Or you can do checksumming at the hardware layer and checksumming at the software/FS layer. Protection in depth.

ZFS has caught issues from hardware, like when LBA 123 is requested but LBA 456 is delivered: the hardware-level checksum for LBA 456 was fine, and so it was passed up the stack, but it wasn't actually the data that was asked for. See Bryan Cantrill's talk "Zebras All the way Down":

* https://www.youtube.com/watch?v=fE2KDzZaxvE

And if checksums are not needed for a particular use-case, make them toggleable: even ZFS has a set checksums=off option. My problem is not having the option at all.

When the vast majority of the devices you sell run on battery power, it makes far more sense from a battery life perspective to handle issues in hardware as much as possible.

For instance, try to find a processor aimed at mobile devices that doesn't handle video decoding in dedicated hardware instead of running it on a CPU core.

  • > […] handle issues in hardware as much as possible.

    1. There is hardware support for (e.g.) SHA in ARM:

    * https://developer.arm.com/documentation/ddi0514/g/introducti...

    But given Apple designs their own CPUs they could add extensions for anything they need. Or use a simpler algorithm, like Fletcher (which ZFS uses):

    * https://en.wikipedia.org/wiki/Fletcher%27s_checksum

    2. It does not have to be enabled by default for every device. The main problem is the lack of it even as an option.

    I wouldn't necessarily use ZFS checksums on a laptop, but ZFS has them for when I use it on a not-laptop.

    • > given Apple designs their own CPUs they could add extensions for anything they need.

      Indeed. They added an entire enterprise grade SSD controller.

      > In its patents there are mentions of periodically refreshing cells whose voltages may have drifted, exploiting some of the behaviors of adjacent cells and generally trying to deal with the things that happen to NAND once it's been worn considerably.