← Back to context

Comment by sbstp

15 hours ago

This article has very little useful information...

There's nothing novel about optimizing queries, sharding and using read replicas.

It has one piece of useful info: their main data store even for 800M users is a single instance of postgres (for writes) without sharding.

  • The post tells you there is a single point of failure: if you wanted to DDOS OpenAI, you'd target write-heavy operations.

    For that reason, I find it actually bold that they disclosed it, and I appreciate it.

    The article reminded me of a similar post about MySQL use for Facebook from the Meta team, which had the same message: big database servers are powerful workhorses that scale and are very cost-effective (and simpler to manage than distributed setups where writes need be to carefully orchestrated - a very hard task).

    The two core mesages of both articles combined could be read as: 1. big DB servers are your friend and 2. keep it simple, unless you can't avoid the extra complexity any more.

    • What Facebook post are you referring to? Generally speaking, Facebook's MySQL infra has been heavily sharded for a very long time, and doesn't rely on abnormally-beefy servers. It's basically the complete opposite approach of what OpenAI is describing here.

  • when I joined twitter in 2011 there was a single mysql master user (not tweets) database and a few dozen read replicas. it was writing about 7000 updates per second and during bursts it would go too high for the single-threaded replication in mysql at the time to keep up with the master which would cause replication lag and all kinds of annoying things in the app. you just have to pick the right time to make the switch before it is an emergency.

    • Postgres setups are typically based on physical replication, which is not an option on MySQL. My testing shows the limit to be about 177k tps with each transaction consisting of 3 updates and 1 insert.

      1 reply →