← Back to context

Comment by rui314

5 years ago

Author here. Good point. I ended up not using fork() without exec(), so that should be fine now, but here is my original plan to use fork():

I wanted to keep a linker process running as a daemon so that it doesn't read the same files over and over again. After loading input files, the linker becomes a daemon and calls fork() to create a worker process. Then the worker process does the rest of linking. In other word, a daemon is a "clean" copy of a linker process image, and each child is specialized for each actual linker invocation.

It turned out that the linker runs much slower with fork() because of the overhead of copy-on-write. You cannot keep a fresh copy of a process just by calling fork() for free. There's a tax associated with it. I tried to workaround, but in the end I had to give up with the fork()-based worker process design.

I wonder if the tax could be reduced with huge pages. Much of the cost of COW is the large number of page faults, but with huge pages you could reduce the number of faults to 1.

> in the end I had to give up with the fork()-based worker process design.

Honestly that's a good thing, or your program just fundamentally couldn't possibly ever work on native Windows!

  • Not possibly working on Windows is, for some of us, a feature. I have no wish to support that ecosystem due to the past ethical violations that founded it, and making it impossible to do so would avoid having to deal with it.

What about posix_spawn() with POSIX_SPAWN_USEVFORK? That saves some of the overhead. See eg https://github.com/rtomayko/posix-spawn#benchmarks

  • posix_spawn is just a wrapper that takes care of setting common parameters for newly forked instances (eg pgrp) and prevents you from doing things that might be overly unsafe or could break vfork from being used in its optimized form. It’s implanted at the libc level, so it’s not a magic syscall that moves the burden of process spawning to the kernel.

    • This doesn’t change your core point but just to be thorough, it’s actually a system call on a few systems, notably macOS.