← Back to context

Comment by ekr____

9 hours ago

Can you elaborate a bit more about what you think the unnecessary complexity here?

A basic source of concern here is whether it's safe for the server to use an initial congestion window large enough to handle the entire PQ certificate chain without having an unacceptable risk of congestion collapse or other negative consequences. This is a fairly complicated question of network dynamics and the interaction of a bunch of different potentially machines sharing the same network resources, and is largely independent of the network protocol in use (QUIC versus TCP). It's possible that IW20 (or whatever) is fine, but it may well may not be.

There are two secondary issues: 1. Whether the certificate chain is consuming an unacceptable fraction of total bandwidth. I agree that this is less likely for many network flows, but as noted above, there are some flows where it is a large fraction of the total.

2. Potential additional latency introduced by packet loss and the necessary round trip. Every additional packet increases the chance of one of them being lost and you need the entire certificate chain.

It seems you disagree about the importance of these issues, which is an understandable position, but where you're losing me is that you seem to be attributing this to the design of the protocols we're using. Can you explain further how you think (for instance) QUIC could be different that would ameliorate these issues?

For point 1, as I noted here [1], total bandwidth and resources are dominated by large flows. Endpoints are powerful enough to handle these large flows. The primary problems would lie with poor intervening networks and setup overhead.

For point 2, that is a valid concern of any case where you have just plain old more data. This dovetails into my actual point.

The problem of going from a 4 KB certificate chain to a 16 KB certificate chain, 160 KB certificate chain, or any arbitrary sized certificate chain should be equivalent to the problem of "server sends N byte response like normal". To simplify the problem a little it is just: the client sends a R-byte request message, the server responds with the Q-byte response message (which happens to be a certificate chain), the client sends the P-byte actual request, the server responds with a K-byte response message. So, at the risk of over-simplification, the problem should only be marginally harder than any generic "time to Q + K bytes".

Of course, if you previously had a 4 KB actual response and a 4 KB certificate chain and now it is a 160 KB certificate chain, you are going from "time to 8 KB" to "time to 164 KB". That is the essential complexity to the problem. But as I noted in my response to your point 1, the amount of server and client resources actually being expended on "small" requests is small with only poor networks where you are now consuming significantly increased bandwidth being a problem.

This then leads into the question of why "time to 8 KB" versus "time to 164 KB" is viewed as such a dramatic difference. This is a artifact of poor protocol design.

From a network perspective, the things that mostly matter are end-to-end bandwidth, end-to-end latency, endpoint receive buffer size, and per-hop bandwidth/buffering. You have a transport channel with unknown, dynamic bandwidth and unknown latency and your protocol attempts to discover the true transport channel parameters. Furthermore, excessive usage degrades overall network performance, so you want to avoid over-saturating the network during your discovery. In a ideal world, you would infer the transport parameters of every hop along your path to determine your holistic end-to-end transport channel parameters. This is problematic due to paths shifting or just plain dynamic throttling, so you will probably only limit yourself to "client to common bottleneck (e.g. your router) path" and "common bottleneck to server path". The "client to common bottleneck path" is likely client controlled and can be safely divided and allocated by the client. The "common bottleneck to server path" is not efficiently controllable by the client so requires safe discovery/inference.

The "initial congestion window" is a initial bandwidth-delay product to avoid over-saturating the network. This does not directly map to the transport parameters that matter. What you actually want is a initial safe "end-to-end bandwidth" which you refine via the discovery process. The latency of your roundtrip then only matters if the endpoint receive buffer size is too small and only effects how quickly you can refine/increase the computed safe "end-to-end" bandwidth.

Under the assumption that a 16 KB "initial congestion window" is fine and we assume the default RTT is ~100 ms (a somewhat reasonable assumption for geographically distributed servers who want to minimize latency) then that is actually a initial safe "end-to-end bandwidth" assumption of (16 KB / 0.1 s * 8 B/b) = ~1.3 Mb/s. Assuming the client advertises a receive buffer large enough for the entire certificate chain (which it absolutely should) and there are no packet losses, the client would get the entire certificate chain in ~(1 s + RTT) in the worst case. Note how that has only a minor dependency on the end-to-end latency. Of course it could get the data sooner if the bandwidth gets refined to a higher number, and a lower RTT gives more opportunities to get refined to a higher number, but that bounds our worst case (assuming no packet loss) to something that is not really that bad especially for the poor network throughput that we are assuming.

This then makes it obvious how to improve this scheme by choosing better initial estimates of "end-to-end" bandwidth or actively communicating that information back and forth. The "client to common bottleneck path" can be "controlled" by the client, so it can allocate bandwidth amongst all of its connections and it can set aside bandwidth on that leg for receiving. This allows higher initial "end-to-end" bandwidth assumptions that can be safely clipped when the client realizes it is in bad network conditions such as plane wifi. If the server determines "I have set aside N b/s to the 'internet' for this client" and the client determines "I have set aside M b/s from the 'internet' for this server" then your only problem is if there is a bottleneck in the broader backbone connections between the server and client. You would almost certainly be able to support better initial bandwidth assumptions or at least faster convergence after first RTT if you communicated that information both ways. This is just a example of what and how things could be improved with fairly minimal changes.

And this all assumes that we are even trying to tackle this fairly fundamental root issue rather than what are probably heaps of other forms of accidental complexity like middleboxes just giving up if the certificates are too large or whatever else nonsense there is which is what I am pretty sure is the real impetus by why they want the networking equivalent of wanting the 737-MAX to handle the same as a 737.

[1] https://news.ycombinator.com/item?id=47210252