← Back to context

Comment by crest

3 years ago

While local NVMe SSD raids can max out a PCIe 16x slot with given large enough blocks and enough queue depth they still can't keep up with small to medium sync writes writes unless you can keep a deep queue filled. Lots of transaction processing workloads require low latency commits which is where flash-backed DRAM can shine. DRAM doesn't requires neither wear leveling nor UNMAP/TRIM. If the power fails you use stored energy to dump the DRAM to flash. On startup you wait for the stored energy level to come to safe operating level while restoring the content from flash, once enough energy is stored you erase enough NAND flash to quickly write a full dump. At this point the device is ready for use. If you overprovision the flash by at least a factor of two you can hide the erase latency and keep the previous snapshot. Additional optimisations e.g. using chunked or indexable compression can reduce the wear on the NAND flash effectively using the flash like a simplified flat compressed log structured file system. I would like such two such cards each of my servers as ZFS intent log please. If their price and capacity are reasonable enough I would like to use them either as L2ARC or for a special allocation class VDEV reserved for metadata and maybe even small block storage for PostgreSQL databases.