Comment by ameliaquining
2 days ago
I suspect the biggest reason those aren't more popular is that they usually have to be stored as state in the database, which isn't what you want when developing an application. You want all of your query logic to be versioned with your application code.
> You want all of your query logic to be versioned with your application code.
SQL can be stored in version control just as well as any other code. This can include application-level queries as well as the DDL SQL which defines the actual structure of your database.
It's sad that tooling for this kind of workflow doesn't seem to be particularly good across the board, Visual Studio being somewhat of an exception.
The problem isn't version-controlling the SQL; it's making sure that, when a particular build of your app executes a query that calls out to a user-defined function, it uses the SQL function code from the same commit that the app itself was built from. Otherwise you have a potential source of version skew, and those are always really annoying and unpleasant to deal with.
I think Postgres sort of supports this but it's really clunky, and also I think you'd have to run the function-creation code on every connection; I don't know whether that would create any performance problems.
What does Visual Studio do?
You just mentioned PostgreSQL, which, like pretty much every RDMBS sans MySQL, DB2/400, and maybe DB2/ZOS (never worked with the ZOS variant) supports schemas. If you need to worry about this, keep your tables that actually contain your data in one schema, then keep views, functions/stored procedures, etc. in separate schemas every time you make incompatible changes.
The database is a separate component, the biggest mistake nearly every developer makes is trying to make a single application own it. Let me tell you, from experience, the chances that this one single application will be the only thing that every connects to your database past initial development is slim. "Oh, but we're doing microservices and everything only ever talks over HTTP or gRPC" - yeah, sure, and I don't have a backlog of tech debt a mile long.
> when a particular build of your app executes a query that calls out to a user-defined function, it uses the SQL function code from the same commit that the app itself was built from
I don't really see a "problem" here. Having everything in the same repo is probably the easiest way to ensure that the client cannot go out of sync with the database. When making a release, create a Git tag and deploy both client and database from there.
But you must make sure you know which tag is deployed at each customer. We use naming conventions for that (the name of the customer is part of the Git tag name), or you could hold that mapping externally if necessary.
Once you have that, making a client-level hotfix for a specific customer is (relatively) easy - just branch from the customer's release tag, do the changes you need, run the tests, and there is high probability everything will work properly. Once you release the hotfix, you create another tag and remember that this tag is now installed at the customer, and so on...
If you make changes to the database, then of course you still need to have an upgrade procedure from one version of the database to another, but you can be confident that the client will never query the "wrong" database version. And since both old and new database structure are just SQL files under different Git tags, you can discover exactly what changed by just by diffing.
> What does Visual Studio do?
Visual Studio has a special SQL project type, where you can keep you base table definitions, as well as all the SQL code (stored procedures, functions, views, user-defined types, indexes etc.). You can group the SQL project together with client projects, tools, automated tests and benchmarks in the same "solution" (kind of a project-of-projects). When working on the product, you load that solution, and you have all dependencies in one place, right in the Solution Explorer. This eases navigation and minimizes dependencies whose source code you cannot (easily) see.
After you make your changes, you deploy to the local database (we use the free SQL Server Developer Edition, other databases have their own free editions), run/debug tests, run/debug clients as needed. You can even start debugging a C# test or client application and step into T-SQL stored procedure seamlessly, which can be a very powerful tool for ferreting-out difficult bugs. When you are done with your changes, commit to Git, let the CI double-check the tests and make the build for the the last-row-of-defense manual QA testing. Then deploy, associate the new Git tag to the customer, rinse-and-repeat...
Basically, we treat SQL like every other code, and manage SQL dependencies not fundamentally unlike any other dependencies.
In most organizations a database is broader than any individual application - both in lifecycle and scope. So it makes sense that this state exists in a different way.
I suspect it’s because people never learned to use them, but they did learn to use the ORM.
The textbooks all say that, but is it really true in practice? I suspect it's a bit of holdover conventional wisdom from an earlier era of software. Usually, when I've created a database for a particular app, it was never used by anything else besides that app (and manual queries, for which versioning is mostly not relevant).
You might choose to have a set of views and functions that are versioned separately from your app, for the same reasons you might choose to set up a microservice. But as with microservices, it should be a deliberate decision that you make because you're getting something out of it that outweighs the complexity costs and version-skew headaches; it should not be the default way to do composition.
It’s absolutely true. Nobody should be building a microservice architecture for a product without a proven market fit, monolithic development is just faster at this stage, and when you get around to scaling it once you outgrow it (which should be much later than conventional microservice wisdom agrees) the database isn’t the bottleneck anyway and hence stays monolithic (don’t confuse with adding shards).
Making two views and deprecating one is actually the simplest possible compatibility strategy - just like adding a new version of a function to a header.