Comment by amelius
25 days ago
Thanks! This explains it.
Now I'm wondering how you deal with the limited number of write cycles of Flash memory. Or maybe that is not an issue in some applications?
25 days ago
Thanks! This explains it.
Now I'm wondering how you deal with the limited number of write cycles of Flash memory. Or maybe that is not an issue in some applications?
During inference, most of the memory is read only.
Sounds fair. That's not the kind of machine I'd want as a development system though. And usually development systems are beefier than production systems. So curious how they'd solve that.
Yeah, it is quite specialized for inference. It's unlikely that you'd see this stuff outside of hardware specifically for that.
Development systems for AI inference tend to be smaller by necessity. A DGX Spark, Station, a single B300 node... you'd work on something like that before deploying to a larger cluster. There's just nothing bigger than what you'd actually deploy to.
HBF, like expensive HBM, is targeted at AI data centers.
PCIe x8 GPU bandwidth is about 32GBbps, so HBF could be 50x PCIe bandwidth.