Comment by TacticalCoder
5 hours ago
> Though only 5gig Ethernet? Can’t they do usb-c / thunderbolt 40 Gb/s connections like Macs?
Does the network speed matter that much when TFA talks about outputting a few tens of tokens per second? Ain't 5 Gbit/s plenty for that? (I understand the need to load the model but that'd be local already right?)
Running inference requires sharing intermediate matrix results between nodes. Faster networking speeds that up.
I read (but cannot find this anymore) that the information sent from layer to layer is minimal. The actual matrix work happens within a layer. They are not doing matrix multiplication over the netwerk (that would be insane latency wise).