Comment by ialad
3 years ago
I'm using Go's default HTTP client to make a few requests per second. I set a context timeout of a few seconds for each request. There are random 16 minute intervals where I only get the error `context deadline exceeded`.
From what I found, Go's default client uses HTTP/2 by default. When a TCP connection stops working, it relies on the OS to decide when to time out the connection. Over HTTP/1.1, it closes the connection itself [1] on timeout and makes a new connection.
In Linux, I guess the timeout for a TCP connection depends on `tcp_retries2` which defaults to 15 and corresponds to a time of ~15m40s [2].
This can be simulated by making a client and some requests and then blocking traffic with an `iptables` rule [3]. My solution for now is to use a client that only uses HTTP/1.1.
[1] https://github.com/golang/go/issues/36026#issuecomment-56902...
[2] https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/
You can configure the HTTP/2 client to use a timeout + heartbeat.
https://go.googlesource.com/net/+/master/http2/transport.go
That's a big file. Mind pointing to a specific line number?
https://go.googlesource.com/net/+/master/http2/transport.go#...
Looks like it got cut off when I originally pasted it.
that sounds like there is pooling going on and not invalidating the pooled connection when a timeout happens. I've actually seen a lot of libraries in other languages do a similar thing (my experience is some of the elixir libraries don't have good pool invalidation for http connections). having a default invalidation policy that handles all situations is a bit difficult but I think a default policy that invalidates on any timeout is much better than a default policy that never invalidates on a timeout. as long as invalidation means just evicting it from the pool and not tearing down other channels on the HTTP/2 connection. for example you could have a timeout on a HTTP/2 connection that is just on an individual channel but there is still data flowing through the other channels.
Wow. Can you easily change the tcp connection timeout?
You can. It’s trivial once you know it’s possible. Not sure why it’s not set by default. https://go.googlesource.com/net/+/master/http2/transport.go
To be clear, this is for http/2, not tcp. You can very easily set read and write deadlines on tcp conns, but you can’t detect if a peer has disappeared without data. You can set keepalive but it’s not reliable and varies wildly between OSs.
You need a heartbeat or ping message together with an advancing deadline to detect dead peers reliably.
1 reply →