Comment by Ozzie_osman

10 hours ago

  We sharded over 20 TB that we know about.

This is probably a typo, right? 20TB isn't that big. I would imagine they've sharded a lot more than that

9 comments

Ozzie_osman

dujuku 5 hours ago

If you think 20TB "isn't that big" I want to know what size of DBs you're working with 0_0

singron 8 hours ago

If your working set is 20 TB, then it's pretty big. Each database has its own mix of hot/cold data, so it's impossible to compare without more information. A better measure might be IOPS. RDS has fairly low maximum IOPS unless you spend a lot more for provisioned IOPS or use Aurora.

rbranson 10 hours ago

You are correct. As a point of comparison: almost ten years ago at Segment we had a single Aurora PostgreSQL instance with ~50T of data, it was used to index potential identity data in a much larger corpus of files stored in S3.

GiorgioG 10 hours ago

For a vast majority of use cases 20TB is positively enormous.

mplanchard 9 hours ago

RDS caps out at 64 TB unless you use Aurora, so 20 TB is totally manageable without sharding.
jeltz 9 hours ago

Yes. But for most workloads it is not much for PostgreSQL. You often will not have to shard at all.
returningfory2 10 hours ago

This product is for Postgres deployments that are so large they need to be sharded. For these use cases, I think 20TB is about normal.
happyopossum 10 hours ago

Sure, but 20TB in “the only database you need” is mere hours or minutes worth of data for many workflows.
tingletech 10 hours ago

that article seems to suggest 20TB total over the dozen deployments in prod.