← Back to context

Comment by Thaxll

6 days ago

We need more details on 6. This is the hard part, like you swap connection from A to B, but if B is not synced properly and you write to it then you start having diff between the two and there is no way back.

Like B is slightly out of date ( replication wise ) the service modify something, then A comes with change that modify the same data that you just wrote.

How do you ensure that B is up to date without stopping write to A ( no downtime ).

Not sure how they do it, but I would do it like so:

Have old database be master. Let new be a slave. Load in latest db dump, may take as long as it wants.

Then start replication and catch up on the delay.

You would need, depending on the db type, a load balancer/failover manager. PgBouncer and PgPoolII come to mind, but MySQL has some as well. Let that connect to the master and slave, connect the application to the database through that layer.

Then trigger a failover. That should be it.

  • > Load in latest db dump, may take as long as it wants.

    400TB its about a week+ ?

    > Then start replication and catch up on the delay.

    Then u have a changes in the delay about +- 1TB. It means a changes syncing about few days more while changes still coming.

    They said "current requests are buffered" which is impossible, especial for long distributed (optional) transactions which in a progress (it can spend a hours, days (for analitycs)).

    Overwall this article is a BS or some super custom case which irrelevant for common systems. You can't migrate w/o downtime, it's a physical impossible.

    • Feels the same to me as well.

      "Take snapshot and begin streaming replication"... like to where? The snapshot isn't even prepared fully yet and definitely hasn't reached the target. Where are you dumping/keeping those replication logs for the time being?

      Secondly, how are you managing database state changes due to realtime update queries? They are definitely going in source table at this point.

      I don't get this. Im still stuck on point 1... have read it twice already.

      6 replies →

    • So you don't understand how something works. That's fine. But to then say the article and/or tech are BS is... a choice.

      This work has been and is being used by some of the largest sites / apps in the world including Uber, Slack, GitHub, Square... But sure, "it's BS, super custom, and irrelevant". Gee, yer super smart! Thank you for the amazing insights. 5 stars.