Comment by GeekyBear

2 days ago

The article speculates on why Apple integrates the SSD controller onto the SOC for their A and M series chips, but misses one big reason, data integrity.

About a decade and a half ago, Apple paid half a billion dollars to acquire the patents of a company making enterprise SSD controllers.

> Anobit appears to be applying a lot of signal processing techniques in addition to ECC to address the issue of NAND reliability and data retention. In its patents there are mentions of periodically refreshing cells whose voltages may have drifted, exploiting some of the behaviors of adjacent cells and generally trying to deal with the things that happen to NAND once it's been worn considerably.

Through all of these efforts, Anobit is promising significant improvements in NAND longevity and reliability.

https://www.anandtech.com/show/5258/apple-acquires-anobit-br...

> The article speculates on why Apple integrates the SSD controller onto the SOC for their A and M series chips, but misses one big reason, data integrity.

If they're really interested with data integrity they should add checksums to APFS.

If you don't have RAID you can't rebuild corrupted data, but at least you know there's a problem and perhaps restore from Time Machine.

For metadata, you may have multiple copies, so can use a known-good one (this is how ZFS works: some things have multiple copies 'inherently' because they're so important).

Edit:

> Apple File System uses checksums to ensure data integrity for metadata but not for the actual user data, relying instead on error-correcting code (ECC) mechanisms in the storage hardware.[18]

* https://en.wikipedia.org/wiki/Apple_File_System#Data_integri...

  • > If they're really interested with data integrity they should add checksums to APFS.

    Or you can spend half a billion dollars to solve the issue in hardware.

    As one of the creators of ZFS wrote when APFS was announced:

    > Explicitly not checksumming user data is a little more interesting. The APFS engineers I talked to cited strong ECC protection within Apple storage devices. Both NAND flash SSDs and magnetic media HDDs use redundant data to detect and correct errors. The Apple engineers contend that Apple devices basically don't return bogus data.

    https://arstechnica.com/gadgets/2016/06/a-zfs-developers-ana...

    APFS keeps redundant copies and checksums for metadata, but doesn't constantly checksum files looking for changes any more than NTFS does.

    • > Or you can spend half a billion dollars to solve the issue in hardware.

      And hope that your hardware/firmware doesn't ever get bugs.

      Or you can do checksumming at the hardware layer and checksumming at the software/FS layer. Protection in depth.

      ZFS has caught issues from hardware, like when LBA 123 is requested but LBA 456 is delivered: the hardware-level checksum for LBA 456 was fine, and so it was passed up the stack, but it wasn't actually the data that was asked for. See Bryan Cantrill's talk "Zebras All the way Down":

      * https://www.youtube.com/watch?v=fE2KDzZaxvE

      And if checksums are not needed for a particular use-case, make them toggleable: even ZFS has a set checksums=off option. My problem is not having the option at all.

      4 replies →

    • That is a weak excuse to rely on data integrity in the hardware. They most likely had that feature and removed it so they wouldn't be liable for a class action lawsuit when it turns out the NAND ages out due to bug in the retention algorithm. NTFS is what, 35 years old at this point? Odd comparison.

      2 replies →

  • Believing that giant companies are monolithic “theys” leads to all sorts of fallacies.

    Odds are very good that totally different people work on the architecture of AFS and SoC design.

    • Even still, those people report to people that report to people until you eventually get to the person in charge of the full product.

  • You can do this yourself in userspace if you really want it:

    https://git.eeqj.de/sneak/attrsum

    I use zfs where I can (it has content checksums) but it sucks bad on macOS, so I wrote attrsum. It keeps the file content checksum in an xattr (which APFS (and ext3/4) supports).

    I use it to protect my photo library on a huge external SSD formatted with APFS (encrypted, natch) because I need to mount it on a mac laptop for Lightroom.

  • Worth noting, for ZFS - you can use the "copies" property of the dataset to save 2 or (usually) 3 separate copies of your data to the drive(s).

Note that this isn't too long after Apple abandoned efforts to bring ZFS into Mac OS X as a potential default filesystem. Patents were probably a good reason, given the Oracle buyout of Sun, but also a bit of "skating to where the puck will be" and realizing that the spinning rust ZFS was built for probably wasn't going to be in their computers for much longer.

Not just durability. Performance too. Apple has a much better SSD controller that is vertically integrated into the stack.

> Through all of these efforts, Anobit is promising significant improvements in NAND longevity and reliability.

Every flash controller does this. Modern NAND is just math on a stick. Lots and lots of math.

Do Apple SSDs have a much longer longevity and reliability? I've not looked at the specific patents nor am I an expert on signal processing but I've worked on SSD controllers and NAND manufacturers in the past and they had their own similar ideas as this.

  • From my experience working on Mac laptops, yeah. SSD failures are incredibly rare but on the flip side when they do go out repairs are very costly.

    I know if my previous job at a large hard drive manufacturer we had special Apple drives that ran different parts and firmware than the regular PC drives. Their specs and tolerances where much different than the PC market at a whole.

Main reason was capturing 100% of storage upsell/upgrade money. They did same thing with RAM.