Comment by joshmn

1 day ago

I have a moderately sized 2TB production database I have enjoyed using pgBackRest on, and was—this week—going to set it up on another 8TB database we have.

What's the next-closest thing? wal-g? barman? databasus? I only get to cosplay as a DBA.

I've used barman on somewhat large-ish DBs (30+ TB), and had no complaints with it. I am a DBRE, if that holds any weight.

  • We recently moved from Barman to pgBackrest. Our main complaints with barman were that incremental backups utilized hardlinks. Which was great, we could have our 7TB database backed up, and the next day, only 20GB in changes. But, when replicating that data to cloud storage, there is no concept of hardlinks, so now we had to push 14TB to cloud storage. Also, at least last time we looked a while back, file compression was only the WAL files, unless you used the newer barman-cloud-backup tool, which we did not.

    Also, pgBackrest lets you do the majority of the backup from a physical standby, which is VERY nice for removing the load off production.

    None of these seemed like issues, until we looked at pgBarman, and suddenly realized how nice that would be.

    • We just piped the backups through pigz for compression; rapidgzip also exists for parallelized decompression (or any other compression algorithm you’d like to use, of course).

  • barman seems to cover "Natural disaster" in their docs. Seems good.

    I'll take a look. Thanks!

Backing up multi terabyte production postgres databases is not merely cos playing ha ha

I can beat you on the timing - I'd never used pgBackRest before, but started setting it up on a project about 2 hours ago, by the time I'd finished the README had been updated.

Anyone put the standby on ZFS or other filesystems that can take snapshots for backup?

  • Not for PostgreSQL, but for MariaDB we run replicas in FreeBSD jails on a server with lots of ZFS space. The jailed Maria instances just stop every hour (so the DB flushes everything to disk), the host snapshots all of their data volumes, and then starts the jails back up. Within a minute or so they're fully caught up to the primaries again. Gives us months and months of recovery checkpoints.

    It's great because it's a completely clean save from a shutdown state, so when we need a scratch copy of a database it only takes as long as cloning whatever snapshot we want (depending on how far back we need to to), then starting a scratch jail that runs from those clone filesystems. When finished, just shutdown scratch and delete the clones, it's like it never happened.

  • A previous company I was at did this on the primary. It always seemed to work, but no one was really comfortable with it, largely because there wasn't too much ZFS experience at the time and also because the process did not coalesce the database before doing it. I think it's still a valid strategy, but not one I have had time to verify thoroughly.

databasus does not do PITR.

  • Is that info up-to-date? Their readme states:

      **Backup types**
      
      - **Logical** — Native dump of the database in its engine-specific binary format. Compressed and streamed directly to storage with no intermediate files
      - **Physical** — File-level copy of the entire database cluster. Faster backup and restore for large datasets compared to logical dumps
      - **Incremental** — Physical base backup combined with continuous WAL segment archiving. **Enables Point-in-time recovery (PITR)** — restore to any second between backups. Designed for disaster recovery and near-zero data loss requirements
    

    EDIT: It seem PITR has been added this March (for PostgreSQL)

    https://github.com/databasus/databasus/issues/411