Comment by iwontberude
2 months ago
And this is the investment case for AMD, models fit entirely in a single chassis, and side benefit: less tariffed network equipment to interconnect compute. Map/reduce instead of clustered compute.
Edit: when downvoting, please offer some insight why you disagree
How is the a unique advantage for AMD?
AMD is consistently stacking more HBM.
This means that you can fit larger and larger models into a single node, without having to go out over the network. The memory bandwidth on AMD is also quite good.
It really does not matter how much memory AMD has if the drivers and firmware are unstable. To give one example from last year:
https://www.tomshardware.com/pc-components/gpus/amds-lisa-su...
They are currently developing their own drivers for AMD hardware because of the headaches that they had with ROCm.
26 replies →
So the MI300x has 8 different memory domains, and although you can treat it as one flat memory space, if you want to reach their advertised peak memory bandwidth you have to work with it like an 8-socket board.
1 reply →
MI355X isn't out yet, and the upcoming B300 also has 288GB HBM3e
1 reply →
> when downvoting, please offer some insight why you disagree
And remind that (down)voting is not for (dis)agreement.