Comment by mrob

25 days ago

During inference, most of the memory is read only.

Sounds fair. That's not the kind of machine I'd want as a development system though. And usually development systems are beefier than production systems. So curious how they'd solve that.

  • Yeah, it is quite specialized for inference. It's unlikely that you'd see this stuff outside of hardware specifically for that.

    Development systems for AI inference tend to be smaller by necessity. A DGX Spark, Station, a single B300 node... you'd work on something like that before deploying to a larger cluster. There's just nothing bigger than what you'd actually deploy to.

  • HBF, like expensive HBM, is targeted at AI data centers.

      The KAIST professor discussed an HBF unit having a capacity of 512 GB and a 1.638 TBps bandwidth.
    

    PCIe x8 GPU bandwidth is about 32GBbps, so HBF could be 50x PCIe bandwidth.