← Back to context

Comment by francoisLabonte

10 years ago

Ah... The problems of crappy consumer ethernet equipment ( I work at an ethernet switch vendor so excuse the rant )

What is likely happening is that your switch is configured by default to implement both rx and tx pause. What is happening is that your TV who's also erroneously ( in my opinion ) configured to transmit pause goes bonkers, starts sending pause to your switch. Your switch then starts buffering packets for your tv until the buffers are full and then starts transmitting pause to everyone else including ports. The switch must have some horrible buffering policy where one port ( the tv port ) can hog all the buffers and deprive every other ports of being able to send...

Now the kicker is that the way every endstation implements pause is this... Notice the pause quanta in the Pause packet is in units of 512 bits and in the packet you captured it is set to the maximal value of 65535 which is on a 100Mbps port ( presuming 100Mbps since the Mediatek has 4x100Mbps (Fast Ethernet) and 2x1Gbps ( Gigabit Ethernet ) that computes to >>> 512*65535/100.e6 = 0.3355392seconds

A normal Pause sends this packet periodically and once it has buffers to receive will send a pause with a quanta of 0 meaning cancel previous timer... but if it's malfunctioning who knows if it ever will...

The sad part is that I don't even know what to recommend for a good consumer level switch that has good defaults or configurable defaults and sane buffer config... Mine is a dinky one probably vulnerable to this problem as well... Need to do some research.

> A normal Pause sends this packet periodically and once it has buffers to receive will send a pause with a quanta of 0 meaning cancel previous timer... but if it's malfunctioning who knows if it ever will...

As with most protocols, it doesn't work if it's not implemented properly.

Many hardware implementations have no knowledge of when the buffer can be emptied, so it's understandable they treat it as a on/off switch. Screaming, my buffer is full, don't send me anything for 65535*512 bit-times! Which is perfectly good, because otherwise all those incoming frames would need to be dropped anyways.

Remember, small embedded devices can't often guarantee ethernet DMA slot time to DRAM, and definitely can't afford to have a dedicated DRAM channel, so those buffers are on a 2-8 kB on-chip SRAM block or equivalent.

When the buffer is "full" (above high water mark), an interrupt gets generated and device firmware will set up an appropriate DMA transfer to empty it. Once that is done, the device should of course send a PAUSE 0, and all is good.

> The sad part is that I don't even know what to recommend for a good consumer level switch that has good defaults or configurable defaults and sane buffer config...

Like you must know, you can turn it off entirely in most managed switches, see what happens to data transfer speed.

Most consumer level gigabit switches seem to have maybe 16 kB buffer. So they don't really have much buffers (or anything) to configure.

  • Like you said the host might have small buffers and without Pause it would drop, but who's supposed to buffer the packets, the cheap switch with 16kB of buffers and super idiotic buffer configuration such that everyone else on that switch gets paused?

    You seem to think that it's bad to drop packets in the nic and while some nic might have buffers that are too small but in general you should drop. If you use TCP the window will adjust to whatever your bad nic and embedded system can handle. At least you won't affect the others by spreading pause like a cancer ( can you tell I am cynical on pause )

    Usually on a switch you can usually drop packets based on the number of packets destined to a port and packets buffered per input port. This is how you can avoid head of line blocking but again if you are right with 16kB that's barely enough for a jumbo packet (~9200B)... geez that's depressing.

    • > If you use TCP the window will adjust to whatever your bad nic and embedded system can handle.

      TCP window, sigh... It can't deal with the situation where, say, every second frame is lost, because someone thought 2 kB is enough buffer. TCP congestion control mechanisms are great for actual congestion, but when packet loss is due to other causes, it's actually pretty bad.

      Again, TCP is no substitute for flow control in this case.

      Doesn't matter how nice NIC you have. The problem usually happens before the packets reach your nice NIC.

      1 reply →

> What is likely happening is that your switch is configured by default to implement both rx and tx pause.

Yep, this is much more likely than that it's accidentally forwarding the pause frames. The magic phrase to google here is "head of line blocking": https://en.wikipedia.org/wiki/Ethernet_flow_control#Issues

For that reason you should basically never enable ethernet flow control except for on a Fibre Channel over Ethernet SAN, and even then they had to invent Priority-based Flow Control to make it sane. If this is a managed switch then you should be able to disable it.

(I used to also work at an ethernet switch vendor.)

I've had the exact same thing happen when a host crashed. Relatively modern Intel network chip on the host, Netgear GS108 switch (BCM5398 I believe). Presumably when the host stops servicing interrupts, the card's buffer fills up and then generates pause frames.

I don't think it requires the switch to have a bad buffer policy - all the switch ports didn't die at once, just one-by-one as each connected device tried to send a broadcast packet. I don't see a way of avoiding this logical situation if pause frames are sacrosanct - it seems that a switch would need a heuristic to forget pausing and start silently dropping packets to only the affected ports.

(I've since disabled pause frames on those cards, since I don't really need them.)

  • This same root issue - trying to implement "reliable multicast" - is why DaveM rejecting the AF_BUS IPC implementation a few years ago. In any multicast or broadcast system you can't allow one stoned endpoint to wedge the bus for everyone.

> The switch must have some horrible buffering policy where one port ( the tv port ) can hog all the buffers and deprive every other ports of being able to send...

Or could it be that the switch is oblivious to STP ethernet addresses and PAUSE frames ? The frames shown(presumably originating from the TV) have a destination address of 01:80:C2:00:00:00 , and if the dumb switch doesn't know that this is kind of a special address, it'll just do what multicast bit in that address tells i to do - copy the frame out to every port..

Yeah. IMO most of the blame here falls on the TV rather than the switch. Even when it's implemented well, Ethernet pause frame generation should not be enabled - certainly, not by default - on a consumer product, because it's really unreasonable to expect the average $17 consumer switch to handle it nicely. Furthermore, there's usually little need & little benefit to trying to make a home network lossless via L2 pause.