← Back to context

Comment by rsync

9 months ago

"Fortunately, UniSuper had backups at another cloud provider. Otherwise, a bad situation could have been oh so much worse."

Years ago we ran ad campaigns on reddit that said something like:

"Your data is stored on AWS and your backups are stored on AWS ... you're doing it wrong."

... and they got almost zero traction.

In fact, many people were angered by the suggestion that data at a major cloud provider could be at risk in any way.

I call it “Cloud 3-2-1” backup. You really should replicate your backups to a separate commercial provider, or even a local replica (depends on context). Most often, it’s to protect yourself from yourself.

I’ve given up on trying to convince other people, though. Fortunately for me, unlike you, it’s not my bread and butter to do so.

  • When we migrated Netflix to AWS in 2009-2011 we setup a separate archive account on AWS for backups and also made an extra copy on GCP as our “off prem” equivalent. We also did a weekly restore from archive to refresh the test account data and make sure backups were working. I’ve documented that pattern many times, some people have even implemented it…

Valid thing to raise in the campaign, but also... AWS is not Google. There will often be several attempts at communication before an account is disabled and I'm not even sure what protections need to be lifted for actually deleting an account.

Having worked with both clouds for several years, I'm intrigued by Google's services but struggle with trusting them enough to use for production.

  • It’s not only about AWS vs Google.

    It’s about insider and external threats. Operator error. System design failures.

    There’s a lot of ways to mess up your own account.

  • But doesn’t the problem only occur when these safeguards fail?

    I mean - I get that you’re saying Google has fewer checks and balances than AWS, but at some point it must be possible for the customer contact process to go wrong.

    It’s an extra slice of Swiss cheese, but it only makes it less likely, not impossible.

Bit unrelated question and please excuse me if it's going in a wrong way - any plans to have sort of multithread support in rsync? Being single thread limited is painfully slow. I'm concerned on data reads ( disks do much better being read in parallel ) and checksumming - doing 2nd pass of rsync over medium sized mysql db of around 2tb is literally slower than just something like tar ..|zstd | ssh "zstd -| tar -".

Doing copy with rclone can be much much faster as well.

As always, there’s a trade off. By using native backups, it’s typically cheaper and easier, with external backups having their own risks. The risk of the cloud provider making a stupid mistake is so small that there are usually many other risks that are worth mitigating first.

  • It’s usually quite easy to replicate your backups to another completely independent AWS account (not in the same organization, different payment method, etc).

    You’re taking the backups anyway, why not at least store them somewhere that can’t be deleted by the same red button as the original data?