Comment by matthewaveryusa

6 years ago

Looks like their backups only consisted of in-region backups on systems that were homogeneous. Common pitfall. While technically a 3-node distributed system may provide disaster recovery from one node failing, in practice, an accidental rm -rf from an ansible script targeting all three machines, or a bug in the software that's doing the replication, will leave you without a backup plan.

If you're in such a situation, The easiest is to do filesystem level backups with something like zfs and ship the backups to a third-party system that only has write/append-only semantics (better yet, use a write-once-read-many (WORM) disk to really guarantee it.).While there will still be _some_ data loss, it'll let you recover since the last snapshot.

If you don't have zfs, a database backup that runs the db dump script and scp/sftps it to a server running as a cronjob can also be an immediate remedy while you get your shit together (and by that I mean buy yourself a product with an immaculate reputation like aurora or cockroachdb to manage the db for you)

Harder but better would be to tee the log of the changestream (all distributed systems have such a log) to a third-party system. This is ideal because if it's done synchronously it'll let you recover since the last committed transaction.

And of course, test your backups, because backups are subject to code rot as well.

1 comment

matthewaveryusa

namibj 6 years ago

What backup strategy are you implying for the case of cockroachdb? Streaming the changefeed (including timestamps) to an external append-only system while slowly and incrementally iterating through all tables using as of system time to reduce impact on active transactions and know how late this shard of a "full backup" can be inserted into the "agumented" changefeed you'd generate by interleaving these shards into the changefeed. For replay you'd use the stream from the oldest shard up to the select min(a) from (select max(timestamp_resolved) as a from changefeeds group by table) newest timestamp you know you have the transactions complete changesets for (the resolved timestamp can be periodically emitted to confirm that no further records in the same feed(/table) could have a transaction timestamp earlier than it, inducing a partial ordering).

You could replay the (combined,sorted,agumented) changefeed in-order, or shard it on the table's primary key to ensure per-key monotonicity when applying the streams in parallel threads/transactions/nodes.