Comment by willquack

5 days ago

> you can run an initial VDiff, and then resume that one as you get closer to the cutover point.

VDiff (v2) only compares the source and destination at a specific point in time with resume only comparing rows with PK higher than the last one compared before it was paused. I assume this means:

1. VDiff doesn't catch updates to rows with PK lower than the point it was paused which could have become corrupt, and

2. VDiff doesn't continuously validate cdc changes meaning (unless you enforce extra downtime to run / resume a vdiff) you can never be 100% sure if your data is valid before SwitchTraffic

I'm curious if this is something customers even care about, or is point in time data validation sufficient enough to catch any issues that could occur during migrations?

You are correct about resuming. If you do an initial VDiff and then resume that same VDiff say 1 month later it would only diff rows with a higher PK value.

But there's also nothing stopping you from doing a new VDiff to cover all data at that later point in time.

  • "But there's also nothing stopping you from doing a new VDiff to cover all data at that later point in time." --- isn't this just pushing the same issue forward in time? How is data consistency maintained if a customer reverts back to original while having served a few request from new one already?

    • It's open source. If you really want to know these things, I would encourage you to look at the code and read the documentation. As noted in the blog post, reverse vreplication is setup when you switch. You can switch back and forth and nothing is lost.

      https://github.com/vitessio/vitess

      https://vitess.io/docs/reference/vreplication/

      "isn't this just pushing the same issue forward in time?" I don't understand what you are trying to say here. You can only compare the two sides / databases at the same logical point in time. While you are doing this comparison at that point in time, the timeline continues to progress. Unless you want to stop the world and prevent writes for the full duration of the diff (which can be days or even weeks).

  • Thanks for responding!!

    I think it's still the same issue where data modified after the VDiff point in time isn't validated before SwitchTraffic. I'm mostly curious how vitess users handle this case, or if any users even care about about this case in the first place?

    Is there no demand for continuous data validation similar to what TiDB offers?

    Do people who care about 100% correct data validation just accept the downtime required to run a full VDiff before SwitchTraffic?