← Back to context

Comment by eugercek

18 hours ago

If you use xfs (+`file_copy_method=CLONE`) you can do this with Postgres 18.

`CREATE DATABASE clankerdb TEMPLATE sourcedb STRATEGY=FILE_COPY;`.

But Ardent can be useful for many, because cloud providers uses heavily restricted Postgres. And many use Aurora, which doesn't event let configure the `log_line_prefix`.

Though if cloud providers add file_copy_method=CLONE compatible managed pg ...

ref: https://boringsql.com/posts/instant-database-clones/

Here's how I do it with Heroku. Are there some cloud services that don't have an equivalent?

  heroku pg:backups:capture --app x
  heroku pg:backups:download --app x
  pg_restore --verbose --clean --no-acl --no-owner -h localhost -U postgres -d y local_db_for_robots_etc.dump

This takes more than 6 seconds. I'm curious how they achieved that for arbitrary DBs!

  • We've got docs on how we did it :)

    https://docs.tryardent.com/architecture

    But essentially we get around the restrictions of the original DB by replicating into a different postgres compatible DB that essentially serves as a read replica. That DB is the one that branches but since it mirrors the original DB you get effective clones

    By doing this we get a lot more control over what we can do to create the clones. The read replica clones using copy on write + isolated autoscaling compute to clone in 6s. We use neon to do this since we think they've implemented those two properties well.

    Since it's default postgres logical replication + DDL triggers you can technically point it at any "branching enabled" db on the other end in order to achieve the same effect

A little slow but on Aurora you can attach then promote read replicas. Iirc that's around 20 minutes but I haven't tested recently.

I'd think you could also setup logical rep to a VM then snapshot and clone the storage which is generally pretty fast.

  • You can create a new instance directly on AWS aurora. Takes less than 20 minutes!

      aws rds restore-db-cluster-to-point-in-time \
          --source-db-cluster-identifier <source-cluster> \
          --db-cluster-identifier <new-cluster> \                                         
          --restore-type copy-on-write \                                                   
          --use-latest-restorable-time \                                                   
          --db-subnet-group-name <sub group> \                     
          --vpc-security-group-ids <security group> \            
          --serverless-v2-scaling-configuration MinCapacity=0,MaxCapacity=16

I wanted to try doing something similar to this in our dev environment (think shared dev database but per branch clones), but this limitation seemed tricky to accept:

> The source database can't have any active connections during cloning.

I wouldn't mind some lock contention, but having to kill all connections seemed a bit harsh

Oh nice, the `/var` part of my workstation is a dedicated nvme drive and it's coincidentally formatted as xfs.