← Back to context

Comment by threecheese

10 hours ago

> What I had missed is that we deployed a new internal service last week that sent less than three GetPostRecord requests per second, but it did sometimes send batches of 15-20 thousand URIs at a time. Typically, we'd probably be doing between 1-50 post lookups per request.

That’ll do it.

The incredible part about this is because their backend is all TCP/IP they were literally exhausting the ports by leaving all 65k of them in TIME_WAIT, and the workaround was to start randomizing the localhost address to give them another trillion ports or so.

And then they fix the issue by using multiple localhost IPs rather than, perhaps, not sending 15-20 thousand URIs at a time

  • They mentioned it was a temporary fix that they removed after finding and fixing the true root cause, though.