Stonebraker on CAP theorem and Databases (2010)

6 hours ago (perspectives.mvdirona.com)

Normally, I'm not a fan of putting the date on a post. However, in this case, the fact that Stonebraker's article was published in 2010 makes it more impressive given the developments over the last 15 years - in which we've relearned the value of consistency (and the fact that it can scale more than people were imagining).

In short: eventual consistency is insufficient in many real-world error scenarios which are outside the CAP theorem. Go for full consistency where possible, which is more practical cases than normally assumed.

  • But full consistency isn't web scale! There are a lot of times where full consistency with some kind of cache in front of it has the same client quirks as eventually consistency though.

    As always, the answer is "it depends".

I think we try too hard to solve problems that we do not even have yet. It is much better to build a simple system that is correct than a messy one that never stops. I see people writing bad code because they are afraid of the network breaking. We should just let the database do its job.

A lot of these kinds of discussions tend to wipe away all the nuance around why you would or wouldn't care about consistency. Most of the answer has to do with software architecture and some of it has to do with use cases.

FYI. This was written in 2010 although it feels relevant even now. Didn't catch it until the mention of Amazon SimpleDB.

The 2010 is really important here. And Stonebraker is thinking about local databases systems and was a bit upset but the NoSQL movement push at the time.

And he is making a mistake in claiming the partitions are "exceedingly rare". Again he is not thinking about a global distributed cloud across continents.

The real world works with Eventual Consistency. Embrace it, for most 90% of the Business Scenarios its the best option: https://i.ibb.co/DtxrRH3/eventual-consistency.png

  • > And he is making a mistake in claiming the partitions are "exceedingly rare". Again he is not thinking about a global distributed cloud across continents.

    Any time an AWS region or AZ goes down we see a lot of popular services go nearly-completely-down. And it's generally fine.

    One thing I appreciate about AWS is that (operating "live" in just a single AZ or even single region) I've seen far fewer partition-causing networking hiccups than when my coworkers and I were responsible for wiring and tuning our own networks for our own hardware in datacenters.

  • Remember also that "partition" is not "yes or no" but rather a latency threshold. If the network is connected but a call now takes 30 seconds instead of milliseconds, that is probably a partition

  • I would say quite the opposite - most business have little need for eventual consistency and at a small scale its not even a requirement for any database you would reasonably used, way more than 90% of companies don't need eventual consistency.

    • No. The real world is full of eventual consistency, and we simply operate around it. :-)

      Think about a supermarket: If the store is open 24/7, prices change constantly, and some items still have the old price tag until shelves get refreshed. The system converges over time.

      Or airlines: They must overbook, because if they wait for perfect certainty, planes fly half empty. They accept inconsistency and correct later with compensation.

      Even banking works this way. All database books have the usual “you can’t debit twice, so you need transactions”…bullshit. But think of a money transfer across banks and possibly across countries? Not globally atomic...

      What if you transfer money to an account that was closed an hour ago in another system? The transfer doesn’t instantly fail everywhere. It’s posted as credit/debit, then reconciliation runs later, and you eventually get a reversal.

      Same with stock markets: Trades happen continuously, but final clearing and settlement occur after the fact.

      And technically DNS is eventual consistency by design. You update a record, but the world sees it gradually as caches expire. Yet the internet works.

      Distributed systems aren’t broken when they’re eventually consistent. They’re mirroring how real systems work: commit locally, reconcile globally, compensate when needed.

      1 reply →

This is why the winning disturbed systems optimize for CP. It's worth preserving consistency at the expense of rare availability losses particularly on cloud infrastructure