Comment by ClumsyPilot

4 years ago

"Clearly vendors and users are at odds with each other here; vendors want the best benchmarks (so you can sort by speed descending and pick the first one), but users want their files to exist after their power goes out."

Clearly the vendors are at odds with the law, selling a storage device that doesn't store.

I think they are selling snake-oil, otherwise known as commiting fraud. Maybe they made a mistake in design, and at the very least they should be forced to recall faulty products. If they know about the problem and this behaviour continues ait is basically a fraud.

We allow this to continue, and the manufacturers that actually do fulfill their obligations to the customer suffer financially, while unscurpulous ones laugh all the way to the bank.

I agree, all the way up to entire generations of SDRAM being unable to store data at their advertised speeds and refresh timings. (Rowhammer.) This is nothing short of fraud; they backed the refresh off WAY below what's necessary to correctly store and retrieve data accurately regardless of adjacent row access patterns. Because refreshing more often would hurt performance, and they all want to advertise high performance.

And as a result, we have an entire generation of machines that cannot ever be trusted. And an awful lot of people seem fine with that, or just haven't fully considered what it implies.

I don't know if a legal angle is the most helpful, but we probably need a Kyle Kingsbury type to step into this space and shame vendors who make inaccurate claims.

Which is currently all of them, but that was also the case in the distributed systems space when he first started working on Jepsen.

  • Fraud is fraud, though.

    • Sure, of course. But even if you did want to seek a legal remedy, someone would have to do the work to clearly document the issue for the purposes of making it clear to a non-technical courtroom.

      And at the point where that documentation had been done, that on its own might be enough to right the ship without anyone actually having to get sued.

This isn't fraud.

The tester is running the device out of spec.

The manufacturers warrant these devices to behave on a motherboard with proper power hold up times, not in whatever enclosures.

If the enclosure vendor suggests that behavior on cable pull will fully mimick motherboard atx power loss then that is fraud. But they probably have fine print about that, I'd hope.

  • "The manufacturers warrant these devices to behave on a motherboard with proper power hold up times"

    Thats an interesting point, doesn't 'power failure' also include potential failure of the power supply, in which case you might not get that time?

    Or what if a new write command is issued withing the holdup time, does the motherboard /OS know about powerloss during those 16 milliseconds that the power is still holding?

    • 'Power loss' or 'power failure' for a part designed to operate at ATX specs does not mean supply failure. Supply failure can cause anything up to and including destruction of all components and even death of operator.

      Anyway, let's firm up how an SSD works and what the OS knows.

      SSDs have volatile DRAM buffers as a staging area to use before writing to the flash.

      Flush (OS ioctl) means the data is successfully residing in the volatile DRAM of the SSD.

      This is all the OS knows and usually ever knows in the ioctl cycle.

      If power is lost there is some time before the >16ms is up that power good signal is lost on the motherboard. The voltage on the 3.3V rail will probably also drop enough from nominal to let the SSD controller know it better gets its housekeeping in order. In other words, dump the DRAM somewhere permanent and deal with it on the next power up.

      Anything the OS is doing in the interim will not likely be acknowledged as flushed so that's not a concern. The OS userspace write will never complete. That loop works fine.

      The thing that gets people up in arms is that flush means the SSD has the data only in volatile memory and not necessarily in non-volatile storage.

      All performant SSDs seem to work this way. They need buffers.

      The larger form factor enterprise drives, which are maybe 25% more expensive, have PLP capacitor banks. These supply a solid 50ms of power. Some manufacturers supply oscilliscope screenshots and such.

      Anything else seems to be variable in its approach to power loss, particularly the smaller, hotter M.2 parts .

      Capacitor banks have issues like taking up space, causing inrush currents, gaining impedance over time, and mediocre reliability at the high temperatures that latest M.2 sticks experience.

      3 replies →