Comment by wtallis

5 years ago

NVMe is natively a multi-queue storage protocol, so there's no reason for the application or OS to collect IO requests into a single thread before issuing them to the lower layers. The normal configuration is for each CPU core to be allocated its own IO queue for the drive. But multithreaded synchronous (blocking) IO often isn't enough to keep a high-end NVMe SSD properly busy; you run out of cores and get bogged down in context switching overhead at a few hundred thousand IOPS even with a storage benchmark program that doesn't need any CPU time left over for productive work.

With a sufficiently low overhead async API (ie. io_uring) you can saturate a very fast SSD with a single thread, but I'm not sure it would actually make sense for a game engine to do this when it could just as easily have multiple threads independently performing IO with most of it requiring no synchronization between cores/threads.