Comment by justinfrankel

6 hours ago

ah reading their analysis, there are errors that explain this. Particularly this:

  tcp_now   = 4,294,960,000  (frozen at pre-overflow value)
  timer     = 4,294,960,000 + 30,000 = 4,294,990,000
              (exceeds uint32 max → wraps to a small number)

timer wraps to a small number, they say

  TSTMP_GEQ(4294960000, 4294990000)

they forgot to wrap it there, it should be TSTMP_GEQ(4294960000, small_number)

  = (int)(4294960000 - 4294990000)
  = (int)(-30000)
  = -30000 >= 0 ?  → false!

wrong!

There may be a short time period where this bug occurs, and if you get enough TCP connections to TIME_WAIT in that period, they could stick around, maybe. But I think the original post is completely overreacting and was probably written by a LLM, lol.

There does appear to be a bug, but it's not what the blog describes.

If tcp_now stops updating at <= 2^32 - 30000 milliseconds, then TSTMP_GEQ(tcp_now, timer) will always fail since timer is tcp_now + 30000 which won't wrap.

This does look like it is possible since calculate_tcp_clock() which updates tcp_now only runs when there's TCP traffic. So if at 49 days uptime you halted all TCP traffic and waited about a day, tcp_now would be stuck at the value before you halted TCP traffic.

In cases where tcp_now gets stuck at > 2^32 - 30000, it looks like TCP sockets in the TIME_WAIT will end up being closed immediately instead of waiting 30 seconds, which isn't great either.