Comment by spapas82

1 month ago

> External volume for the database - it does not write to the local file system (we use DigitalOcean Block Storage)

Is this common? Why not use the local filesystem? Actually, I thought that using anything else beyond the local filesystem for the database is a no-no. Am I missing something?

Databases on cloud providers are usually not on file systems local to the instance because local instances are meant to fail at any time.

Block storage is meant to be reliable, so databases go there. Yes it's slower but you don't lose data.

Generally, the only time you want a local database in the cloud is if it's being used for short-lived data meaningful only to that particular instance in time.

Or it can work if your database rarely changes and you make regular backups that are easy to revert to, like for a blog.

  • Databases have tools to work with storage or servers that can fail. You would need to use replication between multiple database servers and a backup method to some other storage.

    Databases with high availability and robust storage were possible before the cloud.

    • Sure, but replication and automatic failover is a huge pain to configure. It's a gigantic step in architecture complexity, requiring multiple database servers.

      I'm not saying it can't be done. But block storage is built for reliability in a way that ephemeral instances are not. There's a good reason why every guide will tell you to set your database up on block storage rather than an instance's local disk. If your instance fails, just spin up another instantly and reconnect to the same block storage.

      Pre-cloud, the equivalent would have been using redundant RAID storage to handle disk failures (easy), before upgrading to replication with an always-running replica (harder).

Yeah I wouldn't even entertain running RDBMS in network storage for fsync and mmap reasons alone.

  • Isnt that how most managed postgres work? Or db in kubernetes etc?

    • No it's important to use local disk. Network disk means magnitude higher latency i/o at best. Even in kubernetes, it has special machinery to manage databases.