← Back to context

Comment by kedean

3 years ago

Many databases can be distributed horizontally if you put in the extra work, would that not solve the problems you're describing? MariaDB supports at least two forms of replication (one master/replica and one multi-master), for example, and if you're willing to shell out for a MaxScale license it's a breeze to load balance it and have automatic failover.

I worked at a mobile game company for years and years, and our #1 biggest scaling concern was DB write throughput. We used Percona's MySQL fork/patch/whatever, we tuned as best we could, but when it comes down to it, gaming is a write-heavy application rather than the read-heavy applications I'm used to from ecommerce etc.

Sharding things out and replicating worked for us, but only because we were microservices-y and we were able to split our schemas up between different services. Still, there was one service that required the most disk space, the most write throughput, the most everything.

(IIRC it was the 'property' service, which recorded everything anyone owned in our games and was updated every time someone gained, lost, or used any item, building, ally, etc).

We did have two read replicas and the service didn't do reads from the primary so that it could focus on writes, but it was still a heavy load that was only solved by adding hardware, improving disks, adding RAM, and so on.

Not without big compromises and a lot of extra work. If you want a truly horizontally scaling database, and not just multi-master for the purpose of availability, a good example solution is Spanner. You have to lay your data out differently, you're very restricted in what kinds of queries you can make, etc.

  • Clarification, you can make unoptimized queries on Spanner with a great degree of freedom when you're doing offline analysis, but even then it's easy to hit something that's too slow to work at all, whereas in Postgres I know it'd not be a problem.

For what it's worth, I think distributing horizontally is also much easier if you're already limited your database to specific concerns by splitting it up in different ways. Sharding a very large database with lots of data deeply linked sounds like much more of a pain than something with a limited scope that isn't too deeply linked with data because it's already in other databases.

To some degree, sharding brings in a lot of the same complexities as different microservices with their own data store, in that you sometimes have to query across multiple sources and combine in the client.

We've done that (MSSQL Always on) and that's what keeps the lights on today. It's not, however, something that'll remain sustainable.