Comment by slickytail
7 days ago
The memory bandwidth on an H100 is 3TB/s, for reference. This number is the limiting factor in the size of modern LLMs. 100GB/s isn't even in the realm of viability.
7 days ago
The memory bandwidth on an H100 is 3TB/s, for reference. This number is the limiting factor in the size of modern LLMs. 100GB/s isn't even in the realm of viability.
That bandwidth is for the whole GPU, which has 6 mermoy chips. But anyways, what I'm proposing isn't for the high-end and training, but for making inference cheap.
And I was somehat conservative with the numbers, a modern budget SSD with a single NAND can do more than 5GB/s read speed.
That bandwidth is for the whole GPU, which has 6 chips. But anyways, what I'm proposing isn't for the high-end and training, but for making inference cheap.
And I was somehat conservative with the numbers, a modern budget SSD with a single NAND can do more than 5GB/s read speed.