Comment by londons_explore
13 hours ago
When sharded, anything crossing a shard boundary becomes non-transactional.
Ie. if you shard by userId, then a "share" feature which allows a user to share data with another user by having a "SharedDocuments" table cannot be consistent.
That in turn means you're probably going to have to rewrite the application to handle cases like a shared document having one or other user attached to it disappear or reappear. There are loads of bugs that can happen with weak consistency like this, and at scale every very rare bug is going to happen and need dealing with.
> When sharded, anything crossing a shard boundary becomes non-transactional.
Not necessarily? You can have two-phase commit for cross-shard writes, which ought to be rare anyway.
Two-phase commit provides an eventual consistency guarantee only....
Other clients (readers) have to be able to deal with inconsistencies in the meantime.
Also, 2PC in postgres is incompatible with temporary tables, which rules out use with longrunning batch analysis jobs which might use temporary tables for intermediate work and then save results. Eg. "We want to send this marketing campaign to the top 10% of users" doesn't work with the naive approach.
Sorry, what am I missing here, this complaint is true for all architectures, because the readers are always going to be out of sync with the state in the database until they do another read.
The nanosecond that the system has the concept of readers and writers being different processes/people/whatever it has multiple copies, the one held by the database, and the copies held by the readers when they last read.
It does not matter if there is a single DB lock, or a multi shared distributed lock.
These are limitations in the current PostgreSQL implementation. It's quite possible to have consistent commits and snapshots across sharded databases. Hopefully some day in PostgreSQL too.