← Back to context

Comment by Ozzie_osman

10 hours ago

  We sharded over 20 TB that we know about.

This is probably a typo, right? 20TB isn't that big. I would imagine they've sharded a lot more than that

If you think 20TB "isn't that big" I want to know what size of DBs you're working with 0_0

If your working set is 20 TB, then it's pretty big. Each database has its own mix of hot/cold data, so it's impossible to compare without more information. A better measure might be IOPS. RDS has fairly low maximum IOPS unless you spend a lot more for provisioned IOPS or use Aurora.

You are correct. As a point of comparison: almost ten years ago at Segment we had a single Aurora PostgreSQL instance with ~50T of data, it was used to index potential identity data in a much larger corpus of files stored in S3.

For a vast majority of use cases 20TB is positively enormous.

  • Yes. But for most workloads it is not much for PostgreSQL. You often will not have to shard at all.

  • This product is for Postgres deployments that are so large they need to be sharded. For these use cases, I think 20TB is about normal.

  • Sure, but 20TB in “the only database you need” is mere hours or minutes worth of data for many workflows.