Comment by lmm
3 years ago
I've never understood this logic for webapps. If you're building a web application, congratulations, you're building a distributed system, you don't get a choice. You can't actually use transactional integrity or ACID compliance because you've got to send everything to and from your users via HTTP request/response. So you end up paying all the performance, scalability, flexibility, and especially reliability costs of an RDBMS, being careful about how much data you're storing, and getting zilch for it, because you end up building a system that's still last-write-wins and still loses user data whenever two users do anything at the same time (or you build your own transactional logic to solve that - exactly the same way as you would if you were using a distributed datastore).
Distributed systems can also make efficient use of cache, in fact they can do more of it because they have more of it by having more nodes. If you get your dataflow right then you'll have performance that's as good as a monolith on a tiny dataset but keep that performance as you scale up. Not only that, but you can perform a lot better than an ACID system ever could, because you can do things like asynchronously updating secondary indices after the data is committed. But most importantly you have easy failover from day 1, you have easy scaling from day 1, and you can just not worry about that and focus on your actual business problem.
Relational databases are largely a solution in search of a problem, at least for web systems. (They make sense as a reporting datastore to support ad-hoc exploratory queries, but there's never a good reason to use them for your live/"OLTP" data).
I really don't understand how anything of what you wrote follows from the fact that you're building a web-app. Why do you lose user data when two users do anything at the same time? That has never happened to me with any RDBMS.
And why would HTTP requests prevent me from using transactional logic? If a user issues a command such as "copy this data (a forum thread, or a Confluence page, or whatever) to a different place" and that copy operation might actually involve a number of different tables, I can use a transaction and make sure that the action either succeeds fully or is rolled back in case of an error; no extra logic required.
I couldn't disagree more with your conclusion even if I wanted to. Relational databases are great. We should use more of them.
> I really don't understand how anything of what you wrote follows from the fact that you're building a web-app. Why do you lose user data when two users do anything at the same time? That has never happened to me with any RDBMS.
> And why would HTTP requests prevent me from using transactional logic? If a user issues a command such as "copy this data (a forum thread, or a Confluence page, or whatever) to a different place" and that copy operation might actually involve a number of different tables, I can use a transaction and make sure that the action either succeeds fully or is rolled back in case of an error; no extra logic required.
Sure, if you can represent what the user wants to do as a "command" like that, that doesn't rely on a particular state of the world, then you're fine. Note that this is also exactly the case that an eventually consistent event-sourcing style system will handle fine.
The case where transactions would actually be useful is the case where a user wants to read something and modify something based on what they read. But you can't possibly do that over the web, because they read the data in one request and write it in another request that may never come. If two people try to edit the same wiki page at the same time, either one of them loses their data, or you implement some kind of "userspace" reconciliation logic - but database transactions can't help you with that. If one user tries to make a new post in a forum thread at the same time as another user deletes that thread, probably they get an error that throws away all their data, because storing it would break referential integrity.
> Sure, if you can represent what the user wants to do as a "command" like that, that doesn't rely on a particular state of the world, then you're fine. Note that this is also exactly the case that an eventually consistent event-sourcing style system will handle fine.
Yes, but the event-sourcing system (or similar variants, such as CRDTs) is much more complex. It's true that it buys you some things (like the ability to roll back to specific versions), but you have to ask yourself whether you really need that for a specific piece of data.
(And even if you use event sourcing, if you have many events, you probably won't want to replay all of them, so you'll maybe want to store the result in a database, in which case you can choose a relational one.)
> If two people try to edit the same wiki page at the same time, either one of them loses their data, or you implement some kind of "userspace" reconciliation logic - but database transactions can't help you with that.
Yes, but
a) that's simply not a problem in all situations. People will generally not update their user profile concurrently with other users, for example. So it only applies to situations where data is truly shared across multiple users, and it doesn't make sense to build a complex system only for these use cases,
b) the problem of users overwriting other users' data is inherent to the problem domain; you will, in the end, have to decide which version is the most recent regardless of which technology you use. The one thing that evens etc. buy you is a version history (which btw can also be implemented with a RDBMS), but if you want to expose that in the UI so the user can go back, you have to do additional work anyway - it doesn't come for free.
c) Meanwhile, the RDBMS will at least guarantee that the data is always in a consistent state. Users overwriting other users' data is unfortunate, but corrupted data is worse.
d) You can solve the "concurrent modification" issue in a variety of ways, depending on the frequency of the problem, without having to implement a complex event-sourced system. For example, a lock mechanism is fairly easy to implement and useful in many cases. You could also, for example, hash the contents of what the user is seeing and reject the change if there is a mismatch with the current state (I've never tried it, but it should work in theory).
I don't wish to claim that a relational database solves all transactionality (and consistency) problems, but they certainly solve some of them - so throwing them out because of that is a bit like "tests don't find all bugs, so we don't write them anymore".
2 replies →
Http requests work great with relational dbs. This is not UDP. If the TCP connection is broken, an operation will either have finished or stopped and rolledback atomically and unless you've placed unneeded queues in there, you should know of success immediately.
When you get the http response, you will know the data is fully committed, data that uses it can be refreshed immediately and is accessible to all other systems immediately so you can perform next steps relying on those hard guarantees. Behind the http request, a transaction can be opened to do a bunch of stuff including API calls to other systems if needed and commit the results as an atomic transaction. There are tons of benefit using it with http.
But you can't do interaction between the two ends of a HTTP request. The caller makes an inert request, whatever processing happens downstream of that might as well be offline because it's not and can never be interactive within a single transaction.
Now you're shifting the goalposts. You started out by claiming that web apps can't be transactional, now you've switched to saying they can't be transactional if they're "interactive" (by which you presumably mean transactions that span multiple HTTP requests).
Of course, that's a very particular demand, one that doesn't necessarily apply to many applications.
And even then, depending on the use case, there are relatively straightforward ways of implementing that too: For example, if you build up all the data on the client (potentially by querying the server, with some of the partial data, for the next form page, or whatever) and submit it all in one single final request.