Comment by belorn

9 days ago

As a sysadmin at company that provide fairly sensitive services, I find online cloud backups to be way to slow for the purpose of protecting against something like the server room being destroyed by a fire. Even something like spinning disks at a remote location feel like a risk, as files would need to be copied onto faster disks before services could be restored, and that copying would take precious time during an emergency. When downtime means massive losses of revenue for customers, being down for hours or even days while waiting for the download to finish is not going be accepted.

Restoring from cloud backups is one of those war stories that I occasionally hear, including the occasionally fedex solution of sending the backup disk by carrier.

Many organizations are willing to accept the fallbacks of cloud backup storage because it’s the tertiary backup in the event of physical catastrophe. In my experience those tertiary backups are there to prevent the total loss of company IP in the should an entire site be lost. If you only have one office and it burns down work will be severely impacted anyway.

Obviously the calculus changes with maximally critical systems where lives are lost if the systems are down or you are losing millions per hour of downtime.

For truly colossal amounts of data, fedex has more bandwidth than fiber. I don’t know if any cloud providers will send you your stuff on physical storage, but most will allow you to send your stuff to them on physical storage- eg AWS snowball.

There are two main reasons why people struggle with cloud restore:

1. Not enough incoming bandwidth. The cloud’s pipe is almost certainly big enough to send your data to you. Yours may not be big enough to receive it.

2. Cheaping out on storage in the cloud. If you want fast restores, you can’t use the discount reduced redundancy low performance glacier tier. You will save $$$ right until the emergency where you need it. Pay for the flagship storage tier- normal AWS S3, for example- or splurge and buy whatever cross-region redundancy offering they have. Then you only need to worry about problem #1.

  • If you allow it to cost a bit, which is likely a good choice given the problem, then there are several solutions available. It is important to think through the scenario, and if possible, do a dry run of the solution. A remote physical server can work quite well and be cost effective compared to a flagship storage tier, and if data security is important, you can access the files on your own server directly rather than downloading an encrypted blob from a cloud located outside the country.

In one scenario, with offsite backups ("in the clown" or otherwise): "We had a fire at our datacenter, and there will be some downtime while we get things rolling again."

In the other scenario, without offsite backups ("in the clown" or otherwise): "We had a fire at our datacenter, and that shit's just gone."

Neither of these are things that are particularly good to announce, and both things can come with very severe cost, but one of them is clearly worse than the other.

SK would be totally fine with that though because that means there would eventually be recovery!

You're not designing to protect from data loss, you're designing to protect from downtime.