Comment by danpalmer
5 days ago
pgdog looks interesting, but I read the docs wondering how it handles foreign keys between tables, and the docs don't seem to cover it. It was the first question I had, and I'd assume would be an important question for most users. The project states "Data relationships are not sacrificed to go faster" and that it operates "transparently to the application", but it's not clear how it does that in regard to foreign keys.
Additionally, maybe this is just a pet peeve with Postgres documentation in general, but the docs use "database" to mean both a single database and a Postgres server interchangeably. On some pages this makes the content harder to understand. I think it's good practice to distinguish between databases and "clusters" (no, not those clusters, actually servers), and perhaps between clusters and groups of servers. While some of the naming is unfortunate for historical reasons, re-using the same names for different concepts just causes more confusion.
There is a bit documentation about configuring foreign keys here: https://docs.pgdog.dev/configuration/pgdog.toml/sharded_tabl...
Medium term I can see detecting foreign keys automatically (if constraints exist).
I saw some docs around this but it doesn’t seem to answer the main problem.
If you have users and posts, with posts having a userid field referencing users.id, and you shard the users table on id, how does it behave?
Now I request posts for a user id, is it smart enough to map the posts field to the users field? Where are posts stored? What if one user has all of the posts, now my data will be unbalanced across shards. Can I shard on post.id? If so what are the implications for querying joined to user?
The naive answer is to break foreign key integrity in the database. But it specifically says that pgdog doesn’t require sacrificing data safety.
You shard posts on posts.userid as well, same hashing function, same number of shards.