Comment by EB66

1 month ago

Sharding can be made mostly transparent, but it's not purely a DB-level concern in practice. Once data is split across nodes, join patterns, cross-shard transactions, global uniqueness, certain keys hit with a lot of traffic, etc matter a lot. Even if partitioning handles routing, the application's query patterns and its consistency/latency requirements can still force application-level changes.

7 comments

EB66

awesome_dude 1 month ago

> Once data is split across nodes, join patterns, cross-shard transactions, global uniqueness, certain keys hit with a lot of traffic

If you're having trouble there then a proxy "layer" between your application and the sharded database makes sense, meaning your application still keeps its naieve understanding of the data (as it should) and the proxy/database access layer handles that messiness... shirley

zozbot234 1 month ago

> mostly transparent, but it's not purely a DB-level concern in practice ...

But how would any of that change by going outside Postgres itself to begin with? That's the part that doesn't make much sense to me.

londons_explore 1 month ago
When sharded, anything crossing a shard boundary becomes non-transactional.
Ie. if you shard by userId, then a "share" feature which allows a user to share data with another user by having a "SharedDocuments" table cannot be consistent.
That in turn means you're probably going to have to rewrite the application to handle cases like a shared document having one or other user attached to it disappear or reappear. There are loads of bugs that can happen with weak consistency like this, and at scale every very rare bug is going to happen and need dealing with.
- zozbot234 1 month ago
  
  > When sharded, anything crossing a shard boundary becomes non-transactional.
  Not necessarily? You can have two-phase commit for cross-shard writes, which ought to be rare anyway.
  
  3 replies →