SSD as Long Term Storage Testing

3 years ago (htwingnut.com)

Interestingly, unpowered flash memory has much higher retention rates at low temperatures and much SHORTER retention rates at elevated temperatures (the typical JEDEC spec is just 1 year retention at 30C unpowered).

Flash memory in a freezer (assuming you don't have cold-induced circuit board failures due to CTE mismatch) could last hundreds of years. In a hot car, maybe a month. https://www.curtisswrightds.com/media-center/blog/extended-t...

None of that is particularly surprising, but what's interesting is that write endurance can have the opposite effect... Writing at high temperature (followed by cooling to ambient... or lower) actually improves data retention over just writing at ambient.

  • Is the 1-year spec after all the rated program/erase cycles have already been used, so the flash cells are worn and much "leakier"?

    That's still rather disturbing, since I have datasheets for NAND flash from around 2 decades ago that specify 10 years retention after 100K cycles (although probably at 25C), and some slightly newer ones with 5 years / 1.5K cycles (MLC); and also explains the increasing secrecy surrounding the flash industry. The very few I could find for TLC don't even mention retention and endurance is vaguely specified, and refer you to some, probably super-secret, qualification report for the actual numbers.

    Then again, perhaps I shouldn't be surprised ever since they came up with the misnomers that are "TLC" and now "QLC", and nearly drove SLC to extinction. I don't want 3x or 4x more storage for the same price (or 1/3 or 1/4 the price) if it's 1/8th or 1/16th exponentially more unreliable --- that's how the physics works and there's no way around that --- but that's what they seem to be pushing for.

    You can get $13 128GB TLC SSDs as mentioned in the article but I don't see any $39 128GB SLC SSDs being made, nor $13 ~40GB SLC SSDs, despite the fact that such a device would have the exact same cost per NAND cell (and as a bonus, be much faster and simpler since SLC needs much less ECC and advanced wear leveling algorithms than TLC/QLC.)

    • Not just SLC (1bit per cell). One thing that exasperates me is that one can't find even MLC drives anymore (2bits per cell), now everything is TLC or QLC (QLC sounds as a bad joke, how are they able to sell that thing?).

      A few years ago Samsung Pro SSDs were MLC disks, but suddenly they changed them to TLC. They are shameless for calling them 3bit-MLC that is pure oxymoron. 3bits per cell is TLC-mode, and the degradation is higher than a MLC-mode. Basically, it is a price increase by deceiving the consumer (to achieve this, they have reduced the characteristics of their other lines also. shameless).

      1 reply →

    • > but I don't see any $39 128GB SLC SSDs being made, nor $13 ~40GB SLC SSDs

      Shouldn’t it be 2^3 times more expensive than TLC? IOW a $104 128GB SLC SSD or a $13 16GB SLC SSD.

      Edit: guess you’re right userbinator

      4 replies →

  • Nuts! I have been thinking of leaving an emergency encrypted backup in my car, but evidently that is likely to self cook almost immediately. I assumed the lifetime was not great, but that is far more aggressive than I had feared.

  • I don't understand this. Most laptops will have internal / ssd temperatures over 50C, and yet data usually lasts for years?

    >>"client class SSD must maintain its data integrity at the defined BER for only 500 hours at 52°C (less than 21 days) or 96 hours at 66°C (only four days)."

  • I cool my server to 30°-35° HDD temperatures, because that's supposed to increase their life. However, all my SSDs are then at 23-27° [1]. I think I saw some long endurance tests and they pointed out that failure rates for SSDs increase slightly below 25°. Tricky tradeoff.

    [1]: https://i.ibb.co/dtt6dwj/ssd-hdd-tmp.png

    • HDDs and SSDs operate on fundamentally different technologies, so it shouldn't be a surprise that they desire vastly different environments to reside in.

      3 replies →

    • Yeah, it's interesting because I think some parts of the SSD want cooler temperatures but writing to the flash might actually do less damage to the cells if you're writing at higher temperatures (higher charge mobility in silicon).

The longest lived data is replicated data.

Making more copies, with some geographic distribution, is more important than the durability of any particular technology. This applies to everything from SSDs, HDDs, CDs, to paper, to DNA.

If you want your data to last, replicate it all the time.

> data needs to be validated regularly and cycled to a new media every 5-10 years in order to ensure it’s safe and easily accessible

This is what I do. I hate doing it, but it's for posterity's sake. I'd be lost without certain data. I have old virtual machine disk images that I've been using for years, ISOs of obscure software, and other rarities. Every 4 years I buy a new 4TB HDD and copy over files to a freshly bought disk.

  • > data needs to be validated regularly and cycled to a new media every 5-10 years in order to ensure it’s safe and easily accessible

    I used to do that but found it to be a gamble. I have files back to the 80s, so I rotated them from 5.25" floppies to 3.5" floppies to zip drives to CD-R and the DVD-R. But it's a fragile system, files can get corrupted somewhere along the line and if I didn't migrate in time it can be hard to go back. For instance I lost a handful of files during the iomega zip drive phase when the drive died and I had no way to recover (and the files weren't that important to try to source a new iomega drive).

    Now I simply keep everything online in a big zfs mirror pool.

  • You'd at least need to check checksums of all of them post-copy and preferably store it in error-resistant (RAID5/6 or other error correction) way. Else you might just be copying the errors that sneak in. It might not even be the source hard drive producing it just transient bit flip in RAM

To workaround the various tricks a drive can play, one may wish to use a capacity tester in addition to hardware stats.

Fight Flash Fraud (f3)

https://github.com/AltraMayor/f3

One could setup a destructive wear-test, but results may not be generalized between lots with identical model number. This is because some manufactures sell performant products for early reviews/shills, and eventually start cost-optimizing the identical product sku with degraded performance.

As annoying as the bait-and-switch trend became, for off-brand consumer hardware YMMV.

Good luck =)

  • Kind of going on a tangent, but I think it's relevant so please bear with me:

    I never understood the point of cheaping out on storage media.

    Look, I get it. Most of us have budgets to work with, not all of us can afford enterprise 40TB Kioxia SSDs or enterprise HDDs hot off the presses.

    But if I actually, truly care about my data I'm going to at least shell out for (brand new!) drives from reputable brands like Samsung, Crucial, Seagate, Western Digital, Kingston, and so on. The ease of mind is worth the cost, as far as I'm concerned.

    What is the rationale behind buying used drives, or drives from off-brand vendors of unknown or even ill reputation? Aside from just goofing around, I mean. I never can justify the idea, no matter how strapped for cash I could be.

    • Partially its because the details are somewhat opaque. I do buy new drives from reputable brands, but can find it hard to know what I gain and what I lose at different price points within those manufacturers. It's hard to even find out which drives are dram less, never mind what other features they have that impact reliability. From the days of ibm deskstars I've also learned that drive model reliability is likely only understood by the time its off the market.

      (I agree in principle that used or off brand drives seem insane to me, but at the same time, I do live on laptops I buy used so their drives are actually used as well :/)

    • You can never rely on a storage medium being perfect, so you must always plan for redundancy (e.g. ZFS, off-site backups).

      When you have that, cheaping out on storage doesn’t matter so much anymore.

      3 replies →

    • There is little proof that the "enterprise" expensive ones are that much more durable, let alone for the price. Enterprise ones usually have some better power off protection and some more spare flash for write endurance but that's about it. Hell, we just had 2 of the enterprise intel ones outright die in last month (out of lot of ~30), at 98% life left!

      On spinning rust there is practically no difference on reliability (assuming you buy drives designed for 24/7 not some green shit), just that you can attach SAS to it. We got stacks of dead drives to prove it.

      > What is the rationale behind buying used drives, or drives from off-brand vendors of unknown or even ill reputation? Aside from just goofing around, I mean. I never can justify the idea, no matter how strapped for cash I could be.

      That the flash is the same but strapped to different controller.

      And if you truly care about your data you want redundancy, not more expensive storage. Put saved money into getting your home server with ECC memory(in home) or having one extra node or hot spare (in work).

    • Use-cases differ even for storage, and cost is sometimes a competitive advantage in low-value products. While high-end SSD include onboard super-capacitors to keep the hardware stable during power-failures, larger drive ram buffers with fetch prediction, and sector sparing with wear leveling.

      If your stack uses anything dependent on classic transactional integrity, than the long term hidden IT costs of cheap ssd failures don't make sense.

      "buy cheap, buy twice" as they say. =)

I think it’s a mistake to test the worn and fresh disks at different intervals. I.e., testing worn disks in years 1 and 3, and fresh disks in years 2 and 4.

Let’s say that the worn disks are found to have failed the hash check in year 1 and the fresh disks are found to have failed in year 2. Can you conclude that worn and fresh are equally bad? No, you can’t, because maybe the fresh disks were still OK in year 1 — but you didn’t check them in year 1.

As another example, suppose the worn disks are found to be good in year 1 but the fresh disks are found to be bad in year 2. This seems like an unlikely result, but if it happened, what could you conclude? Well, you can’t conclude anything. Maybe worn is better because they are still good in year 2, but you aren’t checking them in year 2. Maybe fresh is better because the worn will fail in year 1.1 but the fresh last until year 1.9 before failing. Maybe all the disks fail in year 1.5 so they are equally bad.

I think it’s better to test the disks at the same intervals since you can always draw a conclusion.

  • Fully agree. The test would be better if all parameters are/were the same. Same set of data written (why put different GB on each SSD) and same testing periods. Nevertheless it’s an interesting little experiment :-)

I wonder if storing SSD's in a faraday bag or a thick steel box could prevent degradation caused by cosmic rays / errant charged particles passing through?

Curiously, acrylic plastic is one of the best materials to absorb highly energetic particles https://www.space.com/21561-space-exploration-radiation-prot...

  • The most practical highly-effective radiation shielding would be putting your drives in a waterproof container and storing them at the bottom of a swimming pool. That will also do a pretty good job of keeping the temperature stable and not too high.

  • I don't think a faraday cage would help at all to stop any actual particles with mass? It seems like even a steel box would have to be rather thick to make a meaningful dent. Astronauts on the international space station are essentially in a solid metal faraday cage, and they see speckles in their vision each time their orbit passes through weak spots in the Earth's magnetic field.

    If you're really worried about cosmic rays, maybe you could try to figure out their predominant direction of travel for your location, then store your SSD in an orientation that minimizes its cross-sectional area. I naively assume they're coming from straight up?

  • That's like saying "let's put bulletproof vests on the ballistic gel dummies for our test on how survivable this particular bullet round is on the human body."

I might be able to test this, I have a 4790k that I haven't touched in years...

  • I'm interested, but will OP deliver? Not sure I want to get my hopes up just yet :(

    Related, I was wondering how I'm going to learn of the results of the submitted article. Calendar reminder with a link to the blog perhaps? Putting my email address somewhere would be much preferred for me tbh but I didn't see any sign-up form.

I've had magnetic drives with excellent data integrity that spent the last 20 years un touched in self storage units.

I have read CD-RWs of approximately the same age with no data loss.

SSDs sacrifice durability of data for io speed

  • I have HDDs from around 20 years ago and hardly touched in last 10 years, which are still good (able to read out without any issues, random checksum are good, but not all files validated).

    But my optical backs were completely a disaster. Just a little bit over 5 years, over 50% were not able to read out, ~30% could be read but content were corrupted. There might be ~10% still good but too time consuming to check so I dumped them all. I still have optical drives but I cannot really remember when was the last time used it any more.

    For SSDs, I've a couple of them left in cold for around 3 years, just checked a couple of days ago, seems to be good. I'm not sure how much longer they can hold, as there were known issues with Samsung 840 serials.

  • Given the recent debacle with the Samsung SSD firmware, I'd love to read internal hardware and firmware engineer notes and concerns. I think there are quite a few bodies in the closet that is called the consumer SSD market.

    You'd at least hope that enterprise SSDs with a Dell sticker on them are better.

    • The enterprise stuff is almost always longer lasting, but the only one that truly lasts is (was?) optane. You shouldn't trust an ssd long term, especially modern ones. I've probably seen 100 drive failures in total (hdd and ssd) and covid era ssds are garbage for longevity. The big downside of enterprise ssds (besides price) is performance. You can literally double your speed by buying consumer grade (and it's roughly the same price to buy 2 drives for every 1 enterprise grade).

      1 reply →

  • The article doesn't really support this claim. It argues SSDs sacrifice speed and durability for capacity and cost and then posits an unanswered question about absolute durability with an ongoing test to find out.

    I do wish the test had more than $13 TLC drives though.

  • That's a trade-off I'm fine making, because time is in short supply - a snappier experience using my computer is worth it, and I have everything backup up in three places (Backblaze, 6 Drive RaidZ2 array in my home server, and from there to JottaCloud via Restic).