Comment by jcranmer

4 years ago

`fork` is a classic example, as others have mentioned, as something that was implemented because it was [at the time] easy rather than because it was a good design. In the decades since, we've found there are issues that are caused by the semantics of fork, especially if the most common subsequent system call is `exec`.

If you're designing an OS from scratch, support for `fork` and `exec` as separate system calls is not what you want. Instead, you'd be likely to describe something in terms of a process creation system call, which will have eleventy billion parameters governing all of the attributes of the spawned process.

POSIX specifies a fork+exec combo called posix_spawn. This is actually used a fair amount, but the reason it isn't used more is because it doesn't support all of the eleventy-billion parameters governing all of the attributes of the spawned process. Instead, these parameters are usually set by calling system calls that change these parameters between fork and exec. These system calls might, for example, change the root directory of a process or attach a debugger. Neither of these are supported by posix_spawn, which only allows the common operations of changing the file descriptors or resetting the signal mask in the list of actions to do.

And this suggests why you might want vfork: vfork allows you write something that looks like posix_spawn: you get to fork, do your new-process-attribute-setting-flags, and then exec to the new process image, all while being able to report errors in the same memory space.

> If you're designing an OS from scratch, support for `fork` and `exec` as separate system calls is not what you want. Instead, you'd be likely to describe something in terms of a process creation system call, which will have eleventy billion parameters governing all of the attributes of the spawned process.

Or if you happen to be sane you'll have a single, simple system call to create a blank, suspended child process, and all the regular system calls which operate on process state will take a handle or process "file descriptor" to indicate which process to modify rather than assuming the current process as the target.

This was the ultimate flaw of posix_spawn(). As you point out it doesn't support all the things you might want to tweak in the child process—a consequence of trying to capture every aspect of the initial process state in a single process-creation API rather than distributing the work through the normal system calls so that each new interface or state can be adjusted for child processes in the same way that it's adjusted for the current process.

Whatever you do, though, make sure it's possible to emulate fork() reliably with your "better" replacement. Consider the case of Cygwin where emulated fork() calls can (and frequently do) fail in bizarre ways because the "blank" child process was pre-loaded with some unexpected virtual memory mapping by AV software or other system tasks, with the result that a required DLL or private memory space can't be set up at same address used in the parent.

  • To be fair, posix_spawn() is extensible. New attributes, etc. can be added. And there are a number of extensions for it, too. Illumos has some.

    • Most APIs can be extended. The problem is that when someone adds a new tunable parameter or resource that one might want to modify for a child process it doesn't automatically get added to posix_spawn()—that takes extra effort. Which is why I emphasized using the same APIs for the current process and child processes, rather than duplicating the work in two places.