Comment by jeffbee

2 days ago

The argument seems to rely on the point that the replicas are only valuable if you can send reads to them, which I don't think is true. Eventually-consistent replicated databases are valuable on their own terms even if you can only send traffic to the leader.

Can you spell this out for a newb like myself? It looks like you’re saying that a read replica that can’t be read from is still useful, but that doesn’t sound right. What is its usefulness if I can’t read from it?

  • I assume he is referring to master's holding all the primary data, and you threat replica's as backups. So if a master goes down, a replica becomes the master that handles all the data. You never get inconstancies between write and read data, as its all in the same node.

    Issue is that you give up 1x, 2x ... extra read capability, depending on your RF=x. But you gain strong consistency without the overhead of dealing with eventual consistency or full consistency over multiple nodes (with stuff like raft).

    1.

    If you use something like Raft to ensure that you get full consistency, your writes slow down as its multiple trips of network communication per write (insert/update/delete). That takes 2 network operations (send data, receive acknowledgement, send confirm to commit, receive confirm it committed).

    2.

    if you use eventual consistency you have fast writes, and only 1 hop ((send data, receive acknowledgement). But like the article talked about, if you write something and at also try to read that data... It can happen that the server writing is slower, then the next part in your code that is reading. So you write X = X + 1, it goes to server A, but your connection now ready from server B = 1 because the change has not yet been committed on Server B.

    3.

    the above mentioned comment simply assume there is one master over your data, its server A. You write AND read from server A all the time. Any changes to the data are send to Server B, but only as a hot standby. Advantage is that writes are fast, and very consistent. Disadvantage is that you can not scale your reads beyond server A.

    ----

    Now imagine that your data is cut int smaller pieces, or you do not write sequentially like 1 2 3 ... but write with hash distribution, where a goes to server 2, b goes to server 3, f goes to server 2 ...

    It really depends on your workload what is better in the combination of what server setup your running, single node, sharded, replicated sharded, distributed seq, ...

    The DB field is complex ...