Comment by prmph

2 days ago

But if you have data to migrate, it is not always possible to infer the diff, no?

Say I have a users table with a name column. Then I alter the table and split the name column into two new columns: first name and last name.

How is it possible to infer this change, just from seeing the new and old columns?

The best practice way to swap fullname for firstname, lastname would be to:

  1. Migration that adds firstname and lastname columns will all nulls
  2. Deploy application code change to start populating firstname and lastname alongside fullname, still reading fullname in the code.
  3. backfill the firstname and lastname values with a script/command/migration
  4. change app code to read firstname and lastname and stop writing fullname
  5. drop the fullname column

I don't think there's a safe way to do all that in a single migration unless all your app code also lives in the database so it can be atomically deployed. If you have multiple app servers and do rolling deploys with no downtime I think it has to be done in these 5 steps.

  •   6. ensure there are no nulls in firstname and lastname
      7. alter the columns to be NOT NULL
    

    Because no non-statistician uses nullable columns, right?

    Of course, some dbs (SQLServer?) infer NULL from the empty string, or am I misremembering?

    Always having the columns be NOT NULL is a fundamental cheat, after always having a PK, or is that too old school for 2025?

    • There's nothing wrong with nullable fields when it's appropriate. When kids are born they don't have names. Not all users want to tell you their names. A null value is data too.

      1 reply →

With large tables, you can't safely make data changes using migration tools either anyway. If you run a single UPDATE against a large table, you end up with a very long transaction with substantial MVCC implications (old row versions that the db needs to clean up) and can basically break production easily.

Side note, but why do folks always bring up this "firstname lastname" example? It is not ever possible to implement that correctly in an automated fashion: some people have spaces in their first names, and some people have spaces in their last names. (I don't mean to single you out, as this exact example comes up multiple times in every declarative vs imperative discussion!)

  • Oh, I am aware of the problems with this scheme for names. I just used it as a simple example. I wouldn't design my users table this way.

    If I can, I even avoid storing actual names of users at all. If I have to, I use a simple free form "display_name" column; what is put there is up to the user.