Comment by adastra22

4 years ago

...is it?

In it is original implementation, fork() was pretty trivial. All it did was create a new process entry in the kernel table, with all the pages and capabilities and such copied from the original process. Then mark all pages as copy-on-write, and return to the caller. Maybe not trivial, but much less complicated than loading an executable file from disk.

My understanding of Linux internals is maybe 20 years out of date, so I am legitimately curious what makes fork() so complicated these days.

20 comments

adastra22

cryptonector 4 years ago

fork() is not trivial now. Processes are huge now -- they have huge heaps among other things. Copying all that is expensive. In the 80s we tried COW, but that turns out to be very slow as well. What operating systems do now is immediately copy the resident set, then do COW for the rest of writable memory, but in large, multi-threaded processes, this is still too slow.

Use vfork() or posix_spawn().

adastra22 4 years ago
Hrm. Googling "fork linux copy-on-write" seems to find a lot of stack overflow answers from 2014-2015 claiming Linux marks pages as copy-on-write when fork() is called. I didn't see anything more recent in the first page of results.
I could see it being worthwhile to immediately copy a few pages, like the top of the stack, but copying the whole resident set seems excessive. Especially since some of that data might not even be written to.
- CHY872 4 years ago
  
  So the problem is what happens to the old and new processes after the fork. To CoW, you need to mark all the pages read only in _both_ old and new which means that every memory write in the caller will now pagefault since the OS now has to lazily copy on both sides. So with true copy on write the fixed costs may be low but the marginal cost per memory write may be high in both parent and child. In this case you can see why the resident set is copied, yes? It’s the smallest amount of memory that guarantees predictable performance subsequent to the call returning.
  
  6 replies →

spc476 4 years ago

Hmmm, from <https://www.man7.org/linux/man-pages/man3/posix_spawn.3.html>:

    The posix_spawn() and posix_spawnp() functions provide the
    functionality of a combined fork(2) and exec(3), with some
    optional housekeeping steps in the child process before the
    exec(3).  These functions are not meant to replace the fork(2)
    and execve(2) system calls.  In fact, they provide only a subset
    of the functionality that can be achieved by using the system
    calls.

Also, there's no way to set resource limits in the child process, nor switch user or group ID, using posix_spawn().

GoblinSlayer 4 years ago

For that you may need posix_spawn and exec, but still can evade fork completely.

duskwuff 4 years ago
Using fork() also means you end up with shared ownership of resources like file descriptors, which can have some pretty weird consequences.
- cryptonector 4 years ago
  
  This is true with all process creation APIs.
  Windows defaults to CLOEXEC semantics and you have to opt-in to child process inheriting open file handles, and that has caused problems.
  Unix defaults to not-CLOEXEC sematincs, and that too has caused problems.
  
  4 replies →
- bregma 4 years ago
  
  Or more importantly, IPC mechanisms like mutexes. If they're in shared memory, you now have two problems. The runtime of a very very popular scripting languages does this.

turminal 4 years ago

> Then mark all pages as copy-on-write, and return to the caller.

Unix actually copied the memory over initially.

moomin 4 years ago

It was simple when processes were simple, but requirements got serious.