Comment by Dwedit
19 hours ago
Using mmap means that you need to be able to handle memory access exceptions when a disk read or write fails. Examples of disk access that fails includes reading from a file on a Wifi network drive, a USB device with a cable that suddenly loses its connection when the cable is jiggled, or even a removable USB drive where all disk reads fail after it sees one bad sector. If you're not prepared to handle a memory access exception when you access the mapped file, don't use mmap.
> file on a Wifi network drive,
I would simply not mmap this.
> If you're not prepared to handle a memory access exception when you access the mapped file, don't use mmap.
fread can fail too. I don't know why you would be prepared for one and not the other.
Because you're way deep down the call stack in some function that happened to take in a pointer, far far away from the code that opened the file.
If that's your program design then fread is not a substitute. Because you would need to pass in the FILE* pointer to all those calls.
And what are you hoping to do in those call stacks when you find an error? Can any of that logic hope to do anything useful if it can't access this data? Let the OS handle this. crash your program and restart.
You can even mmap a socket on some systems (iOS and macOS via GCD). But doing that is super fragile. Socket errors are swallowed.
My interpretation always was the mmap should only be used for immutable and local files. You may still run into issues with those type of files but it’s very unlikely.
mmap is also good for passing shared memory around.
(You still need to be careful, of course.)
It’s also great for when you have a lot of data on local storage, and a lot of different processes that need to access the same subset of that data concurrently.
Without mmap, every process ends up caching its own private copy of that data in memory (think fopen, fread, etc). With mmap, every process accesses the same cached copy of that data directly from the FS cache.
Granted this is a rather specific use case, but for this case it makes a huge difference.
C doesn't have exceptions, do you mean signals? If not, I don't see how that is that any different from having to handle I/O errors from write() and/or open() calls.
It's very different since at random points of your program your signal handler is caleld asynchronously, and you can only do a very limited signal-safe things there, and the flow of control in your i/o, logic etc code has no idea it's happening.
tldr; it's very different.
Well at least in this case the timing won't be arbitrary. Execution will have blocked waiting on the read and you will (AFAIK) receive the signal promptly in this case. Since the code in question was doing IO that you knew could fail handling the situation can be as simple as setting a flag from within the signal handler.
I'm unclear what would happen in the event you had configured the mask to force SIGBUS to a different thread. Presumably undefined behavior.
> If multiple standard signals are pending for a process, the order in which the signals are delivered is unspecified.
That could create the mother of all edgecases if a different signal handler assumed the variable you just failed to read into was in a valid state. More fun footguns I guess.
1 reply →
Yes, it’s the SIGBUS signal.
Ah, reminds me of 'Are You Sure You Want to Use MMAP in Your Database Management System? (2022)' https://db.cs.cmu.edu/mmap-cidr2022/