← Back to context

Comment by upghost

6 days ago

Whoa a leaderless consensus protocol sounds pretty revolutionary!! So many question -- do you have any resources on this you could share?

Revolutionary may be an overstatement, it just affords different system characteristics. There's plenty of literature on the topic though, starting generally with EPaxos[1]. The protocol that we are developing is for Apache Cassandra, is called Accord[2], and forms the basis of our new distributed transaction feature [3]. I will note that the whitepaper linked in [3] is a bit out of date, and there was a bug in the protocol specification at that time. We hope to publish an updated paper in a proper venue in the near future.

[1] https://www.cs.cmu.edu/~dga/papers/epaxos-sosp2013.pdf [2] https://github.com/apache/cassandra-accord [3] https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-15...

  • https://www.vldb.org/pvldb/vol15/p1337-lee.pdf

    Is this you also or total coincidence?

    • Not even a coincidence really, it's a very different kind of system. It's an implementation of Hermes with network layer integration. Hermes is designed with very different goals in mind, specifically within-DC consensus with minimal failures (with the caveat I am not intimately familiar):

      - Every replica must acknowledge a write, which is undesirable in a WAN setting, due to having to wait for replies from the furthest region

      - At most one concurrent "read-modify-write" operation may succeed, so peak throughput is limited by request latency

      - Failure of any replica requires reconfiguration for any request to succeed (equivalent to leader election), so the leaderless property here does not improve tail latencies, indeed it is likely harmed by exposing your workload to more required reconfigurations

      Cassandra is designed for multiple (usually quite far apart) DC deployments that want to maximise availability and minimise latency, and where failure is expected. Here a quorum system is typically preferable for request latency.