Comment by eatonphil
18 hours ago
Great educational project! I'm curious why you are using Raft and also 2PC unless you're sharding data and doing cross-shard transactions? Or is Raft only for cluster membership but 2PC is for replicating data? If that's the case it kind of seems like overkill but I'm not sure.
Few distributed filesystems/object stores seem to use Raft (or consensus at all) for replicating data because it's unnecessary overhead. Chain replication is one popular way for replicating data (which uses consensus to manage membership but the data path is outside of consensus).
Thank you for this sharp and detailed question! In minikv, both Raft and 2PC are purposefully implemented, which may seem “overkill” in some contexts, but it serves both education and production-grade guarantees:
- Raft is used for intra-shard strong consistency: within each "virtual shard" (256 in total), data and metadata are replicated via Raft (with leader election and log replication), not just for cluster membership;
- 2PC (Two-Phase Commit) is only used when a transaction spans multiple shards: this allows atomic, distributed writes across multiple partitions. Raft alone is not enough for atomicity here, hence the 2PC overlay;
- The design aims to illustrate real-world distributed transaction tradeoffs, not just basic data replication. It helps understand what you gain and lose with a layered model versus simpler replication like chain replication (which, as you noted, is more common for the data path in some object stores).
So yes, in a pure object store, consensus for data replication is often skipped in favor of lighter-weight methods. Here, the explicit Raft+2PC combo is an architectural choice for anyone learning, experimenting, or wanting strong, multi-shard atomicity. In a production system focused only on throughput or simple durability, some of this could absolutely be streamlined.