Comment by numpad0

4 days ago

Maybe latency. IIRC flash is a lot laggier than DRAMs and SRAMs.

1 comment

numpad0

The random access memory models is not really representative of ML workloads (both training and inference), where multiplying large tensors result in predictable memory access patterns.