Comment by toast0

2 days ago

At least in networks I've used, it's better for buffers to overflow than to use PAUSE.

Too many switches will get a PAUSE frame from port X and send it to all the ports that send packets destined for port X. Then those ports stop sending all traffic for a while.

About the only useful thing is if you can see PAUSE counters from your switch, you can tell a host is unhealthy from the switch whereas inbound packet overflows on the host might not be monitored... or whatever is making the host slow to handle packets might also delay monitoring.

Sadly, I'm not too surprised to hear that. I wish we had more rapid iteration to improve such capabilities for real world use cases.

Things like back pressure and flow control are very powerful systems concepts, but intrinsically need there to be an identifiable flow to control! Our systems abstractions that multiplex and obfuscate flows are going to be unable to differentiate which application flow is the one that needs back pressure, and paint too-wide brush.

In my view, the fundamental problem is we're all trying to "have our cake and eat it". We expect our network core to be unaware of the edge device and application goals. We expect to be able to saturate an imaginary channel between two edge devices without any prearrangement, as if we're the only network users. We also expect our sparse and async background traffic to somehow get through promptly. We expect fault tolerance and graceful degradation. We expect fairness.

We don't really define or agree what is saturation, what is prompt, what is graceful, or what is fair... I think we often have selfish answers to these questions, and this yields a tragedy of the commons.

At the same time, we have so many layers of abstraction where useful flow information is effectively hidden from the layers beneath. That is even before you consider adversarial situations where the application is trying to confuse the issue.