← Back to context

Comment by torginus

7 days ago

No it's not slow - a single NAND chip in SSDs offers >1GB of bandwidth - inside the chip there are 100+ wafers actually holding the data, but in SSDs only one of them is active when reading/writing.

You could probably make special NAND chips where all of them can be active at the same time, which means you could get 100GB+ bandwidth out of a single chip.

This would be useless for data storage scenarios, but very useful when you have huge amounts of static data you need to read quickly.

The memory bandwidth on an H100 is 3TB/s, for reference. This number is the limiting factor in the size of modern LLMs. 100GB/s isn't even in the realm of viability.

  • That bandwidth is for the whole GPU, which has 6 mermoy chips. But anyways, what I'm proposing isn't for the high-end and training, but for making inference cheap.

    And I was somehat conservative with the numbers, a modern budget SSD with a single NAND can do more than 5GB/s read speed.

  • That bandwidth is for the whole GPU, which has 6 chips. But anyways, what I'm proposing isn't for the high-end and training, but for making inference cheap.

    And I was somehat conservative with the numbers, a modern budget SSD with a single NAND can do more than 5GB/s read speed.