Comment by klodolph
4 years ago
fork() is not a small tool.
It is a MASSIVE, unwieldy tool that is difficult to use correctly. It happens to have a small interface.
4 years ago
fork() is not a small tool.
It is a MASSIVE, unwieldy tool that is difficult to use correctly. It happens to have a small interface.
...is it?
In it is original implementation, fork() was pretty trivial. All it did was create a new process entry in the kernel table, with all the pages and capabilities and such copied from the original process. Then mark all pages as copy-on-write, and return to the caller. Maybe not trivial, but much less complicated than loading an executable file from disk.
My understanding of Linux internals is maybe 20 years out of date, so I am legitimately curious what makes fork() so complicated these days.
fork() is not trivial now. Processes are huge now -- they have huge heaps among other things. Copying all that is expensive. In the 80s we tried COW, but that turns out to be very slow as well. What operating systems do now is immediately copy the resident set, then do COW for the rest of writable memory, but in large, multi-threaded processes, this is still too slow.
Use vfork() or posix_spawn().
Hrm. Googling "fork linux copy-on-write" seems to find a lot of stack overflow answers from 2014-2015 claiming Linux marks pages as copy-on-write when fork() is called. I didn't see anything more recent in the first page of results.
I could see it being worthwhile to immediately copy a few pages, like the top of the stack, but copying the whole resident set seems excessive. Especially since some of that data might not even be written to.
7 replies →
Hmmm, from <https://www.man7.org/linux/man-pages/man3/posix_spawn.3.html>:
Also, there's no way to set resource limits in the child process, nor switch user or group ID, using posix_spawn().
1 reply →
Using fork() also means you end up with shared ownership of resources like file descriptors, which can have some pretty weird consequences.
6 replies →
> Then mark all pages as copy-on-write, and return to the caller.
Unix actually copied the memory over initially.
It was simple when processes were simple, but requirements got serious.