Comment by xyzzy_plugh

2 days ago

This is exactly backwards. You should have declarative schemas but inferring migrations is crazy. Only pain will follow.

Instead, I am a fan of doing both: either committing the resulting schema of a migration, or hand writing it aside the migration. Then have tests to ensure the database schema matches the expected schema after a migration.

Generating these artifacts is fine, but in TFA's case there is no chance I wouldn't inspect and possibly modify the generated "diff" migration. It's significantly easier to go the other way: write the migration and show me the resulting schema diff.

Completely agree.

When writing a migration, the resulting schema is usually much, much less important than the characteristics of the migration itself. When I review a migration, my first question isn’t “is this the right schema” but “is this migration going to bring downtime”. I’d much rather a smooth migration to an incorrect schema than having a critical table locked for minutes/hours.

I think that updates of stateful components should be imperative (explicit migrations), not declarative (implicit migrations). For example I don’t think Terraform is great tool to manage RDS: it doesn’t tell you the consequences of changing an attribute (database restart or other downtime-inducing stuff), I’d much rather I had to explicitly say how to get from state A to state B.

Similarly, I don’t think SQL migrations are perfect: they’re still declarative, you still need implicit knowledge to know if a migration will take a lock and what will be the consequences. I’d much rather have to code “take explicit lock; alter table xxx;”.

This tool probably allows editing migrations, but I don’t think it’s a step in the right direction. Maybe it’s a step towards databases being much better at migrations (so that we can rely on never having downtime), but even then I think it’ll get worse before it gets better

  • > it doesn’t tell you the consequences of changing an attribute (database restart or other downtime-inducing stuff)

    Modern diff tools are designed to provide better guardrails in these situations. For eg, pg-schema-diff [0] tries to generate zero downtime migrations by using lock-free migrations and warns you about potentially hazardous migrations.

    I think it's good direction to bake these best practices into the tooling itself, rather than relying purely on the experiences of engineers.

    [0] https://github.com/stripe/pg-schema-diff

    • This is exactly the key point. Declarative schema management is typically better at catching hazardous operations, because declarative schema management tools inherently require the ability to introspect your desired CREATE definitions and also introspect the current database state.

      Once a tool has those abilities, adding linters and destructive-action guardrails is much easier. Especially when compared to a traditional migration tool, which often has no underlying understanding of the ALTER TABLE operations being requested.

As a concept, declarative schema management isn't crazy at all. Several thousand companies use this approach, including some huge names like Google, Meta, and GitHub, but many smaller companies too.

When implemented well, with appropriate guardrails and linters, it's perfectly safe. And it has many benefits over imperative schema migrations, such as a substantially better Git versioning flow, and ability to synchronize environments / solve schema drift natively.

The only major conceptual downsides are the inability to handle row data migrations, and inability to handle renames. These can be major hurdles for some smaller companies, but are more irrelevant at companies with large DB infrastructure, who require special-case handling for those operations anyway. In other words, if you have large tables, row data migrations are already substantially more complex than running a single UPDATE statement, and you can't do them using a traditional imperative migration tool anyway.

> there is no chance I wouldn't inspect and possibly modify the generated "diff" migration

Of course, you're generally supposed to do that with these tools. Database changes should always be reviewed carefully. Same with non-database-related infrastructure-as-code tools.

  • > there is no chance I wouldn't inspect and possibly modify the generated "diff" migration

    > Of course, you're generally supposed to do that with these tools

    this seems to be the crux of the comments - we'll try to make it much clearer than declarative schemas don't skip any migrations/review process, they just provide another way of generating the migrations (note: you can also generate them directly from the database using the CLI)

Yes. In particular, migrations expressed as SQL statements are strictly more powerful than a diff of two schemas, so there are situations where you can’t infer the former from the latter (but you can always infer the latter from the former).

I also object to the use of “declarative” here. Either we are talking about the schema as present in the database, then it’s neither declarative nor non-declarative, it’s just whatever is in the database. Or we are talking about a schema definition, and then I really don’t know what a non-declarative schema definition would look like, in contrast to what is called “declarative” here. Thirdly, the traditional “declarative” SQL schema definition is really a series of imperative SQL statements, so arguably not declarative.

What they seem to mean is a minimal sequence of statements that results in the desired schema, as opposed to a longer history of schema-altering statements. However, the minimal version is technically still a series of schema-altering statements, starting from (presumably) an empty schema.

  • "Declarative" is the correct word, as the input to the tool is a desired-state set of CREATE statements, and the tool figures out what DDL is actually necessary to make the target DB reach that desired state.

    In other words, the engineer is declaring "make my schema look like this" and the tool makes that happen. That's the key definition of a declarative system, consistent with how this is described in CS classes.

    Meanwhile traditional migration tools are "imperative", as the engineer must tell the tool exactly what operations (e.g. ALTER) to be run.

It seems like generating the diffs from the schema's version history is equivalent to doing it the opposite way, provided that each diff is tested to make sure the database upgrade works. Not all diffs will correspond to feasible database upgrades, so some patches would have to be rejected.

> You should have declarative schemas but inferring migrations is crazy. Only pain will follow.

Inferring migrations isn't crazy. Automagically applying those migrations without review is crazy.