Comment by digitallyfree
3 years ago
By second disk failure do they mean that the disks on both the primary and fallback servers failed? Or do they mean that two disks (of a RAID1 or similar setup) in the fallback server failed?
The latter is understandable, the former would be quite a surprise for such a popular site. That means that the machines have no disk redundancy and the server is going down immediately on disk failure. The fallback server would be the only backup.
14 hours ago HN failed over to the standby due to a disk failure on the primary. 8 hours ago the standby's disk also failed.
Primary failure: https://news.ycombinator.com/item?id=32024036 Standby failure: https://twitter.com/HNStatus/status/1545409429113229312
The disks on both the primary and fallback servers definitely failed. Each was in a RAID setup, but those failed too in both cases.
Ouch! I'm assuming the disks were from the same batch and installed at the same time, but having at least four fail like that is just crazy unlucky.
Veteran programmer and HN user kabdib has proposed a striking theory that could explain everything: https://news.ycombinator.com/item?id=32028511.