Comment by sanderjd
3 years ago
I think this hits the nail right on the head, and it's the same criticism I have of and article itself: the framing is that you split up a database or use small vms or containers for performance reasons, but that's not the primary reason these things are useful; they are useful for people scaling first and foremost, and for technical scaling only secondarily.
The tragedy of the commons with one big shared database is real and paralyzing. Teams not having the flexibility to evolve their own schemas because they have no idea who depends on them in the giant shared schema is paralyzing. Defining service boundaries and APIs with clarity around backwards compatibility is a good solution. Sometimes this is taken too far, into services that are too small, but the service boundaries and explicit APIs are nonetheless good, mostly for people scaling.
> Defining service boundaries and APIs with clarity around backwards compatibility is a good solution.
Can't you do that with one big database? Every application gets an account that only gives it access to what it needs. Treat database tables as APIs: if you want access to someone else's, you have to negotiate to get it, so it's known who uses what. You don't have to have one account with access to everything that everyone shares. You could
It would be easier to create different databases to achieve the same thing. Those could be in the same database server, but clear boundaries is the key.
Schemas can be useful in this regard
Indeed! And functions with security definers can be useful here too. With those one can define a very strict and narrow API that way, with functions that write or query tables that users don't have any direct access to.
Look at it as an API written in DB functions, rather than in HTTP request handlers. One can even have neat API versioning through, indeed, the schema, and give different users (or application accounts) access to different (combinations of) APIs.
The rest is "just" a matter of organizational discipline, and a matter of teams to internalize externalities so that it doesn't devolve into a tragedy of the commons — a phenomenon that occurs in many shapes, not exclusively in shared databases; we can picture how it can happen for unfettered access to cloud resources just as easily.
But here's the common difference: through the cloud, there's clear accounting per IOP, per TB, per CPU hour, so incentive to use resources efficiently is can be applied on a per-team basis — often through budgeting. "Explain to me why your team uses 100x more resources than this other team" / "Explain to me why your team's usage has increased 10-fold in three months".
Yet there's no reason to think that you can only get accounting for cloud stuff. You could have usage accounting on your shared DB. Does anyone here have experience with any kind of usage accounting system for, say, PostgreSQL?
1 reply →
These are real problems, but there can also be mitigations, particularly when it comes to people scaling. In many orgs, engineering teams are divided by feature mandate, and management calls it good-enough. In the beginning, the teams are empowered and feel productive by their focused mandates - it feels good to focus on your own work and largely ignore other teams. Before long, the Tragedy of the Commons effect develops.
I've had better success when feature-focused teams have tech-domain-focused "guilds" overlaid. Guilds aren't teams per-se, but they provide a level of coordination, and more importantly, permanency to communication among technical stakeholders. Teams don't make important decisions within their own bubble, and everything notable is written down. It's important for management to be bought in and value participation in these non-team activities when it comes to career advancement (not just pushing features).
In the end, you pick your poison, but I have certainly felt more empowered and productive in an org where there was effective collaboration on a smaller set of shared applications than the typical application soup that develops with full team ownership.
In uni we learnt about federated databases, i.e multiple autonomous, distributed, possibly heterogeneous databases joined together by some middleware to service user queries. I wonder how that would work for this situation, in the place of one single large database.
Federated databases are never usually mentioned in these kind of discussions involving 'web scale'. Maybe because of latency? I don't know
> Teams not having the flexibility to evolve their own schemas because they have no idea who depends on them
This sounds like a problem of testing and organization to me, not a problem with single big databases.
Sure. My point is that the organization problems are more difficult and interesting than the technical problems being discussed in the article and in most of the threads.
Introducing an enormous amount of overhead because training your software engineers to use acceptable amounts of resources instead of just accidentally crashing a node and not caring is a little ridiculous.