← Back to context

Comment by elcritch

10 hours ago

That’s pretty awesome!

Though only 5gig Ethernet? Can’t they do usb-c / thunderbolt 40 Gb/s connections like Macs?

It's sad that NDA fetishist Broadcom has a de-facto monopoly on PCIe fabric switches; notably we would have functional open source drivers for at least simpler topologies for a while now, and could just set up cheap FNN topologies by using (usually NMVe targeted) bifurcation support on hosts to get several x4 ports with only a comparatively cheap retimer out into "mini SAS hd" (the square shaped 4-Lane connectors) or QSFP+ ports; and then have a few meters reach on generic DAC cables from such standards (even Skylake-era SAS ones (nominally 12 GT/s; PCIe4.0 is 16 GT/s) should typically manage PCIe4; that's just under 64 Gbit/s from each link, with typical desktop/gaming systems delivering 3~5 links without complaints next to a dGPU (that one at fewer than full lanes).

> Though only 5gig Ethernet? Can’t they do usb-c / thunderbolt 40 Gb/s connections like Macs?

Does the network speed matter that much when TFA talks about outputting a few tens of tokens per second? Ain't 5 Gbit/s plenty for that? (I understand the need to load the model but that'd be local already right?)

  • Running inference requires sharing intermediate matrix results between nodes. Faster networking speeds that up.

    • I read (but cannot find this anymore) that the information sent from layer to layer is minimal. The actual matrix work happens within a layer. They are not doing matrix multiplication over the netwerk (that would be insane latency wise).

I really wonder if AMD is going to keep getting walloped on the interconnect or if they'll start upping what's available to consumers, at some point.