← Back to context

Comment by drothlis

3 years ago

I don't know about Django but in Rails† the migrations are numbered with a timestamp (in the filename itself), instead of a sequential number, precisely to avoid (git) merge conflicts. Then they're run in alphabetical order. There's a dedicated table to record which migrations have been run already.

† 10 years ago, anyway.

Using a timestamp in a migration system doesn't really solve the problem at edge cases though: the time-based ordering of migration file creation may differ from the time-based ordering of migration application, especially accounting for different git branches, and different DB environments (prod, staging, dev DBs for each developer).

The canonical problem is two engineers with branches that each add a new column to the end of the same table. It's easy to end up with schema drift between environments, or cases where executing the migrations in alphabetical order yields a different result than executing them in git history order. You can add various guard rails and band-aids to catch this, but it's ugly.

In contrast, declarative schema management systems inherently avoid this situation. For example, if two engineers try to add a new column to the end of the same table, one of them will get a git merge conflict -- exactly like if two branches in an application code repo tried to edit the same line of a function.

  • > Using a timestamp in a migration system doesn't really solve the problem at edge cases though: the time-based ordering of migration file creation may differ from the time-based ordering of migration application, especially accounting for different git branches, and different DB environments (prod, staging, dev DBs for each developer).

    In many cases this might be "good enough", depending on change frequency.

    For schema drift, I've given up on persisted databases outside of production. we have a docker image that gets made on each commit to our migrations. Staging environments have stopped making sense to me. It may or may not be representative of production. At that point, what does it do to serve me?

    I'm sure there's some team scale where this doesn't hold up, especially around things like load testing. If that were a concern of mine, I'd still want to treat the environments as ephemeral, and would try my best to automatically setup/teardown the environment.

    • One of the many benefits of declarative tooling is the ability to easily see if an environment's state differs from expectations, and then bring it back to the desired state automatically in a minimal number of steps. Generally this is much faster than tearing down and re-creating the environment from scratch.

      You can still use declarative tooling to replace an environment, or spin up a new environment, whether persisted or ephemeral. Declarative tools are faster at this as well, relative to a traditional migration system which has to apply a full history of iterative DDL (e.g. multiple ALTERs vs a declarative tool's single CREATE).

      And of course declarative tools may also be used on production, allowing use of the exact same tool chain in prod as in non-prod.