Comment by adrian_b

2 days ago

In an optimized implementation of model inference, the latency of SSD access has no importance, because no random accesses are done.

The purpose of optimizing model inference for weights stored on SSDs is to achieve a continuous reading from SSDs at the maximum throughput provided by hardware, taking care that any computations and any accesses to the main memory are overlapped over the SSDs reading.