Comment by wtallis

4 years ago

The current standard block storage abstraction presented by SSDs is a logical address space of either 512-byte or 4kB blocks (but pretty much always 4kB behind the scenes). Allocation is implicit upon writing to a block, and deallocation is explicit but optional. This model is indeed a good match for how virtual memory is handled, especially on systems with 4kB pages; there are already NVMe commands analogous to eg. madvise().

The problem is that it's not a good match for how flash memory actually works, especially with regards to the extreme disparity between a NAND page write and a NAND erase block. Giving the OS an interface to query which blocks the SSD considers as live/allocated rather than deallocated and implicitly zero doesn't seem all that useful. Giving the OS an interface to manipulate the SSD's logical to physical mappings (while retaining the rest of the abstraction's features) would be rather impractical, as both the SSD and the host system would have to care about implementation details like wear leveling.

Going beyond the current HDD-like abstraction augmented with optional hints to an abstraction that is actually more efficient and a better match for the fundamental characteristics of NAND flash memory requires moving away from a RAM/VM-like model and toward something that imposes extra constraints that the host software must obey (eg. append-only zones). Those constraints are what breaks compatibility with existing software.

0 comments

wtallis

No comments yet

Contribute on Hacker News ↗