← Back to context

Comment by tptacek

3 years ago

In fairness, I don't know if we kept the default. I'm responding to two independent things at this point: first, there are definitely systems where 200ms delays have rippling impacts, and second, leader elections aren't always benign.

(Consul would, I'm sure, converge eventually regardless of the election frequency, but that doesn't mean everything that relies on Consul will tolerate those delays).

I don't have much of a take here, beyond that I don't think you can extrapolate as much from what's on the 6.824 pages as you might have done here. Certainly, in a system where 200ms is the difference between "healthy" and "not healthy" status on a peer relationship, I'd think you'd want Nagle disabled. But I haven't thought carefully about this, or looked that closely at the typical packet flow between Consul nodes. I could be wrong about all of this; more reason not to give me any money.

Later

Per the comment upthread, I haven't even bothered to check which parts of this packet flow are even TCP to begin with.

I've never directly used Consul's internals, but I'm guessing it uses Stubby, which is built on top of TCP.

  • It does Serf over UDP, but I get fuzzy on the integration of Serf and Consul.

    • Raft and the Consul RPC API use TCP, Serf uses both TCP and UDP.

      While the Consul RCP API may have grown options to use GRPC (I forget now), Raft uses length-prefixed msgpack PDUs.

    • Whoops, I thought this was a Google product, given the discussion. Stubby is basically GRPC internal to Google.