Comment by sastraxi
5 hours ago
I’ve done experiments using BTRFS and ZFS for local Postgres copy-on-write. You don’t need anything but vanilla pg and a supported file system to do it anymore; just clone the database using a template and a newish version of Postgres.
Looking at Xata’s technical deep dive, the site claims that we need an additional Postgres instance per replica and proposes a network file system to work around that. But I don’t really understand why that’s needed. Can someone explain to me my misunderstanding here?
For context for the others, I think you are referring to this blog post: https://xata.io/blog/open-source-postgres-branching-copy-on-... (in particular the "The key is in the storage system" section) right?
What I'm saying there is that if you do Postgres with on top of a local ZFS volume, the child branches Postgres instances need to be on the same server. So you are limited in how many branches you can do. One or two are fine, but if you want to do a branch per PR, that will likely not work.
If you separate the compute from storage via the network, this problem goes away.
Yes, that's what I'm referring to.
You're still making the assumption in this comment: why does my 2nd (cloned) database need a separate postgres instance? One postgres server can host multiple databases.
Got it, yes, I've seen in the other comment that you're referring to the new Postgres 18 feature. If that works for you in local dev, so much the better :)
ZFS snapshots can be transmitted over the network, with some diff-only and deduplication gains if the remote destination has an older instance of the same ZFS filesystem. It’s not perfect, and the worst case is still a full copy, but the tooling and efficiency wins for the ordinary case are battle-tested and capable.
Yes, for sure, and stuff like this is really useful when rebalancing storage nodes, for example.
My point is that for the use case of offering a Postgres service with CoW branching as a key feature, you can't really escape some form of separation of storage and compute.
Btw, don't really want to talk too much about it yet, but our proprietary storage engine (Xatastor) is basically ZFS exposed over NVMe-OF. We'll announce it in a couple of weeks, and we'll have a detailed technical blog post then on pros/cons.
> You don’t need anything but vanilla pg and a supported file system to do it anymore; just clone the database using a template and a newish version of Postgres.
Are you referring to `file_copy_method = clone` from Postgres 18? For example: https://boringsql.com/posts/instant-database-clones/
I think the key limitation is:
> The source database can't have any active connections during cloning. This is a PostgreSQL limitation, not a filesystem one.
Yeah, that's the one. My use case is largely for local development, so the active connections thing isn't a limiter for me.
I also don't really understand how being correct under physical branching with ZFS, or physical backups of a filesystem, are different from crash safety in general. As long as you replay the WAL at the point where you branch (or take a physical backup of the filesystem) you should not lose data?
At the same time Postgres people don't seem comfortable with the idea in practice so I'm not sure if this is actually ok to do.
Crash safety does mean rollbacking all things in progress, but yes, if your database cannot safely do it (even if it is yucky) then you do not have a safe database for any crash situation.
You can't have any other connections while a Copy-on-Write is happening, not even a logical replication slot. So you keep a read replica that then gets all connections briefly cut for the COW to avoid locking the master instance. Then you re-enable the logical replication slots on both the new, copied instance and the "copyable" read replica to get both back up to date w/ master.
Thanks for sharing your workflow. My question is about why two databases on the same server would need two separate postgres instances.