Comment by lazerl0rd

4 years ago

The funny thing here is that battery-backed enterprise systems are worse off in that manner, because you're much more likely to notice a dying battery that your entire device relies on than the little battery pack hooked up to your RAID array.

Sure, you could write a program that periodically checks the battery rate (you'd have to poll since there's no ACPI notification like with a "device battery") and sends an email to the admin or something. However that's a tool that doesn't "exist" (as in, there isn't notable program that does so) which possibly hints that this isn't something system admins often do.

The above also requires there to be an interface available from userland, not only in the management firmware or BIOS/UEFI. That exists for HP, but I'm not sure all other OEMs do so.

To emulate a flushing SSD, the signal really needs to go directly to the SSD firmware so it can decide which is the last OS write it can accept while still having enough power to persist all write and flush requests it has already accepted.

Getting all that right sounds so hard it is probably better to just have enterprise SSD's have a built in supercap to give 5 seconds or so of power to do all the necessary flushing, and for laptop/desktop grade SSD's they only need to offer barriers for data consistency. Laptop and desktop users don't care if they lose the last 1 second of data before a crash as long as what is on the drive is self consistent.

  • I should've been a little clearer; by "enterprise systems" I was referring to RAID controllers and the like. Though yes, I believe enterprise SSDs/NVMes likely have a capacitor or, as one friend put it, an "overkill battery" to use for flushing data.

    To be fair though, I sidetracked from the discussion at hand. The issue Marcan described was regarding the OS -> Disk rather than a "power loss situation". The latter does play in with the former, but solving the latter doesn't necessarily solve the former.

Enterprise system have monitoring through the BIOS which will send an email, expose the status via SNMP and other method of monitoring (same as having a faulty fan).

  • Correct me if I'm wrong, but I wouldn't call the management engine (eg. HP iLO) the BIOS. Whilst those may support such warnings:

    1) Not everyone wants to use iLO or whatever equivalent another OEM provides.

    2) Whilst such systems do support sending warnings about system components via email, dashboards, etc. that doesn't mean they'll necessarily warn about a RAID controller's battery being depleted. If I remember correctly, iLO4 doesn't.

    3) What about RAID cards like the P420 (*not* the P420i) that either aren't hooked up to a management engine or are from an entirely separate OEM?

    • >1) Not everyone wants to use iLO or whatever equivalent another OEM provides.

      Then you aren't an enterprise because they're absurdly useful for managing dozens/hundreds/thousands of systems.

      >3) What about RAID cards like the P420 (not the P420i) that either aren't hooked up to a management engine or are from an entirely separate OEM?

      There's a reason enterprises standardize on a common infrastructure from an OEM that supports everything in the box even though you could go on Newegg and build your own systems for thousands of dollars less.

External batteries can often be connected to via serial (most common), via USB or via IP, so that is definitely one.

  • That's the first time I've heard of batteries [for RAID controllers] having an entirely separate port than that which hooks them up to the controller. Is this a "there are some of X" or have I just been out of the loop?