Comment by voxic11
2 years ago
keepalives are an optional TCP feature so they are not necessarily supported by all TCP implementations and therefor default to off even when supported.
2 years ago
keepalives are an optional TCP feature so they are not necessarily supported by all TCP implementations and therefor default to off even when supported.
Where is it off? Most linux distros have it on it’s just the default kickoff timer is ridiculously long (like 2 hours iirc). Besides, TCP keepalives won't help with the issue at hand and were put in for totally different purpose (gc'ing idle connections). Most of the time you don't even need them because the other side will send RST packet if it already closed the socket.
AFAIK, all Linux distros plus Windows and macOS have TCP keepalives off by default as mandated by the RFC 1122. Even when they are optionally turned on using SO_KEEPALIVE, the interval defaults to two hours because that is the minimum default interval allowed by spec. That can then be optionally reduced with something like /proc/sys/net/ipv4/tcp_keepalive_time (system wide) or TCP_KEEPIDLE (per socket).
By default, completely idle TCP connections will stay alive indefinitely from the perspective of both peers even if their physical connection is severed.
[0]: https://datatracker.ietf.org/doc/html/rfc1122#page-101
OK you're right - it's coming back to me now. I've been spoiled by software that enables keep-alive on sockets.
So we need a protocol with some kind of non-optional default-enabled keepalive.
Now your connections start to randomly fail in production because the implementation defaults to 20ms and your local tests never caught that.
I'm sure there's some middle ground between "never time out" and "time out after 20ms" that works reasonably well for most use cases