Comment by amelius

25 days ago

Thanks! This explains it.

Now I'm wondering how you deal with the limited number of write cycles of Flash memory. Or maybe that is not an issue in some applications?

4 comments

amelius

Reply

mrob 25 days ago

During inference, most of the memory is read only.

amelius 25 days ago
Sounds fair. That's not the kind of machine I'd want as a development system though. And usually development systems are beefier than production systems. So curious how they'd solve that.
- Gracana 24 days ago
  
  Yeah, it is quite specialized for inference. It's unlikely that you'd see this stuff outside of hardware specifically for that.
  Development systems for AI inference tend to be smaller by necessity. A DGX Spark, Station, a single B300 node... you'd work on something like that before deploying to a larger cluster. There's just nothing bigger than what you'd actually deploy to.
- transpute 24 days ago
  
  HBF, like expensive HBM, is targeted at AI data centers.
  The KAIST professor discussed an HBF unit having a capacity of 512 GB and a 1.638 TBps bandwidth.
  PCIe x8 GPU bandwidth is about 32GBbps, so HBF could be 50x PCIe bandwidth.