Comment by dale_glass

4 days ago

So on the off-chance that there's a firmware engineer in here, how does this actually work?

Like does a SSD do some sort of refresh on power-on, or every N hours, or you have to access the specific block, or...? What if you interrupt the process, eg, having a NVMe in an external case that you just plug once a month for a few minutes to just use it as a huge flash drive, is that a problem?

What about the unused space, is a 4 TB drive used to transport 1 GB of stuff going to suffer anything from the unused space decaying?

It's all very unclear about what all of this means in practice and how's an user supposed to manage it.

SSD firmware engineer here. I work on enterprise stuff, so ymmv on consumer grade internals.

Generally, the data refresh will all happen in the background when the system is powered (depending on the power state). Performance is probably throttled during those operations, so you just see a slightly slower copy while this is happening behind the scenes.

The unused space decaying is probably not an issue, since the internal filesystem data is typically stored on a more robust area of media (an SLC location) which is less susceptible to data loss over time.

As far as how a user is supposed to manage it, maybe do an fsck every month or something? Using an SSD like that is probably ok most of the time, but might not be super great as a cold storage backup.

  • So say I have a 4TB USB SSD from a few years ago, that's been sitting unpowered in a drawer most of that time. How long would it need to be powered on (ballpark) for the full disk refresh to complete? Assume fully idle.

    (As a note: I do have a 4TB USB SSD which did sit in a drawer without being touched for a couple of years. The data was all fine when I plugged it back in. Of course, this was a new drive with very low write cycles and stored climate controlled. Older worn out drive would probably have been an issue.) Just wondering how long I should keep it plugged in if I ever have a situation like that so I can "reset the fade clock" per se.

  • >Generally, the data refresh will all happen in the background when the system is powered (depending on the power state).

    How does the SSD know when to run the refresh job? AFAIK SSDs don't have an internal clock so it can't tell how long it's been powered off. Moreover does doing a read generate some sort of telemetry to the controller indicating how strong/weak the signal is, thereby informing whether it should refresh? Or does it blindly refresh on some sort of timer?

    • Pretty much, but it depends a lot on the vendor and how much you spent on the drive. A lot of the assumptions about enterprise SSDs is that they’re powered pretty much all the time, but are left in a low power state when not in use. So, data can still be refreshed on a timer, as long as it happens within the power budget.

      There are several layers of data integrity that are increasingly expensive to run. Once the drive tries to read something that requires recovery, it marks that block as requiring a refresh and rewrites it in the background.

  • > maybe do an fsck every month or something

    Isn't that what periodic "scrub" operations are on modern fs like ZFS/BTRFS/BCacheFS?

    > the data refresh will all happen in the background when the system is powered

    This confused me. If it happens in the background, what's the manual fsck supposed to be for?

  • So you need to do an fsck? My big question after reading this article (and others like it) is whether it is enough to just power up the device (for how long?), or if each byte actually needs to be read.

    The case an average user is worried about is where they have an external SSD that they back stuff up to on a relatively infrequent schedule. In that situation, the question is whether just plugging it and copying some stuff to it is enough to ensure that all the data on the drive is refreshed, or if there's some explicit kind of "maintenance" that needs to be done.

  • How long does the data refresh take, approx? Let's say I have an external portable SSD that I keep stored data on. Would plugging the drive into my computer and running

      dd if=/dev/sdX of=/dev/null bs=1M status=progress
    

    work to refresh any bad blocks internally?

    • A full read would do it, but I think the safer recommendation is to just use a small hdd for external storage. Anything else is just dealing with mitigating factors

      3 replies →

  • Ok, so all bits have to be rotated, even when powered on, to not loose their state?

    Edit: found this below: "Powering the SSD on isn't enough. You need to read every bit occasionally in order to recharge the cell."

    Hm, so does the firmware have a "read bits to refersh them" logic?

    • Kind of. It's "read and write back" logic, and also "relocate from a flaky block to a less flaky block" logic, and a whole bunch of other things.

      NAND flash is freakishly unreliable, and it's up to the controller to keep this fact concealed from the rest of the system.

  • I had to google what 'ymmv' means. To save other people's time – it's 'your mileage may vary'.

Keep in mind that when flash memory is read, you don't get back 0 or 1. You get back (roughly) a floating point value -- so you might get back 0.1, or 0.8. There's extensive code in SSD controllers to reassemble/error correct/compensate for that, and LDPC-ish encoding schemes.

Modern controllers have a good idea how healthy the flash is. They will move data around to compensate for weakness. They're doing far more to detect and correct errors than a file system ever will, at least at the single-device level.

It's hard to get away from the basic question, though -- when is the data going to go "poof!" and disappear?

That is when your restore system will be tested.

  • Unless I am misunderstanding the communication protocol between the flash chip and the controller, there is no way for the controller to know that analogue value. It can only see the digital result.

    Maybe as a debug feature some registers can be set up adjust the threshold up and down and the same data reread many times to get an idea of how close certain bits are to flipping, but it certainly isn't normal practice for every read.

Typically unused empty space is a good thing, as it will allow drives to run in MLC or SLC mode instead of their native QLC. (At least, this seems to be the obvious implication from performance testing, given the better performance of SLC/MLC compared to QLC.) And the data remanence of SLC/MLC can be expected to be significantly better than QLC.

  • >as it will allow drives to run in MLC or SLC mode instead of their native QLC

    That depends on the SSD controller implementation, specifically whether it proactively moves stuff from the SLC cache to the TLC/QLC area. I expect most controllers to do this, given that if they don't, the drive will quickly lose performance as it fills up. There's basically no reason not proactively move stuff over.

    • Cheap DRAM-less controllers usually wait until the drive is almost full to start folding. And then they'll only be folding just enough to free up some space. Most benchmark results are consistent with this behavior.