← Back to context

Comment by dragonwriter

2 months ago

How is the a unique advantage for AMD?

AMD is consistently stacking more HBM.

  H100 80GB HBM3
  H200 141GB HBM3e
  B200 192GB HBM3e

  MI300x 192GB HBM3
  MI325x 256GB HBM3e
  MI355x 288GB HBM3e

This means that you can fit larger and larger models into a single node, without having to go out over the network. The memory bandwidth on AMD is also quite good.

  • It really does not matter how much memory AMD has if the drivers and firmware are unstable. To give one example from last year:

    https://www.tomshardware.com/pc-components/gpus/amds-lisa-su...

    They are currently developing their own drivers for AMD hardware because of the headaches that they had with ROCm.

    • "driver" is such a generic word. tinygrad works on mi300x. If you want to use it, you can. Negates your point.

      Additionally, ROCm is a giant collection of a whole bunch of libraries. Certainly there are issues, as with any large collection of software, but the critical thing is whether or not AMD is responsive towards getting things fixed.

      In the past, it was a huge issue, AMD would routinely ignore developers and bugs would never get fixed. But, after that SA article, Lisa lit a fire under Anush's butt and he's taking ownership. It is a major shift in the entire culture at the company. They are extremely responsive and getting things fixed. You can literally tweet your GH issue to him and someone will respond.

      What is true a year ago isn't today. If you're paying attention like I am, and experiencing it first hand, things are changing, fast.

      17 replies →