← Back to context

Comment by zinodaur

2 years ago

If theres just one Sequencer, and every ReadVersion request to the Proxy eventually hits the Sequencer 1-1, how does the Sequencer not get crushed? Or is a scaling limit just "the number of ReadVersion requests a Sequencer machine can handle per second", which admittedly is a cheap request to respond to

Requests to the sequencer are batched heavily. If the sequencer fails, the cluster goes through a recovery and will be unavailable for 2-3 seconds and then recover.

  • Good point about the batching! Any idea what kind of ReadVersion qps throughput you can get this way? And yeah, 2-3s unavailability seems fine.

Not used FDB but reading the article and considering the semantics, it shouldn't matter too much if the Sequencer "just" distributes recent read-versions to the proxy frequently enough(unless that proxy has received a recent read-version "recently").

Worst case if there is heavy contention on the same keys then resolvers will eventually fail more transaction writes but for read-only transactions most applications should be fine with a slightly "old" version.

(Yes, all this will start to cause down if there is high key contention and many conflicts)

  • My understanding of ReadVersion is that the only point of calling it is to be able to read your own writes - so staleness wouldn't be good. There was another sibling comment that says the ReadVersion requests are batched up before hitting the sequencer, I could definitely believe that would work.

    • Yeah, read versions are just there to make what's known as "external consistency" work. That is, if transaction B starts after transaction A commits, then B will see the effects of A.

      The reason external consistency is nice is that if you change the database, as soon as you get a commit signal, you can tell any other client "hey, go check the database and see what I did" and they will see the changes. No worries about whether the changes have sync'ed yet or anything like that.

      1 reply →

Yeah that seems like an untenable design choice. Was quite interested until I read that. Max TPS? and MTTR when sequence inevitably shits itself?

  • You can trivially scale fdb to tens of millions of tx/sec for write-heavy workloads without a hardcore cluster for transactions of reasonable complexity (though with careful design on my part and the part of others for collisions to be unlikely).

    MTTR on failure is seconds. Really, there's no system I've used that is as robust and performant as fdb and I include s3 in that list - s3, for example, _routinely_ has operations with orders of magnitude latency variance and huge, correlated spikes.