← Back to context

Comment by uecker

1 day ago

The elegance of the fork() + exec() model is that every kind of configuration can be done after the fork using all the usual APIs. Every attempt to replace it with a combined call that I have seen so far seemed fundamentally poorer because it needs to add all configuration options as parameters to the call and then do this in away that you can extend it later and does not become a mess.

I have the entirely opposite opinion. IMO a big mistake of the UNIXy model is that so much state is preserved across the creation of a process. For example, there are APIs to have a specific thing be fd number 4 so you can run a program and have it find that thing at fd 4. This is weird.

Windows, for all its many, many faults, did not use fork+exec and instead mostly has options for how one creates a process. It wasn’t done elegantly, but it was the right decision.

  • Well, a lot of the power of the UNIX shell comes form this and I see this as a major advantage over Windows. So no, I do not think Windows got it right.

    Any kind of replacement should aim for the same conceptual simplicity and power. Sadly, I fear that people driving development nowadays are more interested in building unbreakable walled gardens for advertisement or app stores, or trying to squeeze down the some small gain when used on the cloud. I am more interested in general computing on the user side.

  • Having fd 4 mean something specific is no weirder than having fds 0,1, and 2 mean something specific, which is probably never going to change. At some point you just gotta embrace the Unix.

  • Is it weirder, that you can pass an variable precisely into argument 4? You do need to pass information to a subprocess and there needs to be some agreement on what means what. Sure, maybe you could use names instead of fds, but that sounds needlessly complicated.

    • A way to pass a defined list of handles to a subprocess (or a friendly other process) makes sense. Having that mechanism be direct inheritance of those handles with the same numbering as the source is obnoxious.

    • That’s like saying you could use positions to specify function argument access (as in assembly) instead of variable names. File descriptors being numbers that are likely array indexes in a file handle seems like a leaky abstraction. Having a namespace that a parent process share with its children seems like a much cleaner design.

  • Well, Cygwin and Busybox have shown me that fork-heavy activities are about 100x slower on Windows than Linux.

    The Windows approach may be correct, but it suffers in performance from the POSIX perspective.

    I have heard that WSL1 iimproves this.

    • Linux has worked pretty hard to optimize fork(). This doesn’t mean that fork() is a good idea.

      Windows does not historically depend on fork(), so there was no native fork(), so Cygwin kludged it up.

      2 replies →

  • You're simply failing to grasp the value of the simplicity, compatibility, and portability of POSIX/*nix. Inventing yet another way to create a process would be complex and break things. It's a-la-carte to enable configuration after fork of the new CoW or non-CoW process but before exec (unless vfork or similar were used). This is the model.

    If you want to greenfield re-engineer the world with all new system calls and a totally different execution model, feel free to go right ahead.

    • "The reasonable man adapts himself to POSIX: the unreasonable one persists in trying to adapt the POSIX to himself. Therefore all progress depends on the unreasonable man."

      ― George Bernard Shaw, probably.

I kinda disagree, though I do see the usefulness here. While fork/exec can be useful in some cases, it'd be honestly pretty neat if the APIs took a pidfd argument (maybe with 0 meaning current process). Only program is setuid/setgid binaries I suppose but maybe this case is better handled by special casing `exec`.

For example

   pidfd_t ps = spawn(); // creates a process stopped (kernel does this anyway by default)
   setuid(ps, 33);
   capset(ps, ...);
   socket(ps, ...);
   mmap(ps, ...);
   process_vm_writev(ps, ...);
   exec(ps, ...);
   signal(ps, SIGCONT);
   // error handling elided

I guess this is a little bit me being a bit of critical of the usual syscall APIs for not thinking about "what if I want to do this to another process I have access to" but...

It also makes things like thread safety even reasonably doable with fork. I do agree though that stuff like CreateProcess which take in a gazillion parameters don't really make for the greatest of userspace APIs

  • Maybe, a few people proposed this. It is a lot better than a single spawn call.

    But how often would one actually need this? And what are the semantics? Refer arguments (e.g. file descriptors) to the current process or the other one? How are cross-permissions handled? It seems a lot of complexity...

    Someones proposed a ptrace_syscall which could achieve the same thing.

    • > But how often would one actually need this?

      Well, the idea is that it'd probably be close to the default API for spawning processes (and could even be the bedrock for posix_spawn and friends in libc (and potentially even "simple" fork cases[1])). fork/clone would be the special case

      In most cases, most programs don't need special setup. Something like `ptrace_syscall` would also work for this and would be probably the way to do it with the backwards compat limitations of nowadays

      ptrace-ability seems to be generally how permissions for this sort of thing are handled in general (see also procfs, process_vm_writev, ptrace, etc). The complication is a little bit around setuid programs but either you could special case execve to imply SIGCONT for setuid or have execve also imply a SIGCONT as well

      [1]: Probably would be rare for a compiler to optimize it though

Calling that elegant is a path dependence of the history of fork+exec.

In an alternative world where fork+exec never existed, a lot of those "usual APIs" would probably have had an explicit pid argument to them that let you modify process configuration from a different process. (This is how Fuschia works, e.g.). There's a lot of benefit to this world: the most obvious is that you don't have to magic up some IPC system just to report configuration errors, but there's actually a good amount of utility in being able to have a manager process that is tweaking attributes of its children (e.g., debuggers would love it).

  • Or you could call ptrace_syscall (that doesn't currently exist) on your child processes that you are tracing because you'd always be tracing them by default, or get an io_uring for the child process, or...

    • A ptrace_syscall would be interesting and would seem to be a full replacement for having the pid argument everywhere.

      But frankly, I am not really seeing the value.

      3 replies →

> The elegance of the fork() + exec() model is that every kind of configuration can be done after the fork using all the usual APIs.

Unfortunately, the opposite is true, when the parent process is multi-threaded. In the child process, only one thread exists (the thread returning from fork()), but the memory is an exact copy of the parent's. As a result, the child may inherit locks (resident in memory) that are in acquired state, but have no owner threads -- the threads that are responsible for eventually releasing those locks in the child's copy of the process memory do not exist in the child. If the single thread in the child process (returning from fork()) attempts to take such a lock (before exec), it deadlocks. This is why POSIX says that only async-signal-safe functions may be called in a child process, between fork and exec. And then, for example, "malloc" is not such a function (at least per POSIX), so the fork-to-exec environment in the child process is an extremely uncomfortable one. You've got to preallocate everything in the parent, can't report errors to stderr, etc.

https://pubs.opengroup.org/onlinepubs/9799919799/functions/f...

https://pubs.opengroup.org/onlinepubs/9799919799/functions/V...

The fork(2) Linux manual page spells out the sam restriction.

https://man7.org/linux/man-pages/man2/fork.2.html

https://man7.org/linux/man-pages/man7/signal-safety.7.html

"pthread_atfork" exists, but is effectively unusable.

https://pubs.opengroup.org/onlinepubs/9799919799/functions/p...

Yeah. The right way to eliminate fork() is to make the usual APIs that modify process state take an explicit process handle, so the same APIs can be used to set up an empty process. They can also be composed in other ways, eg for IPC or debugging.

I agree with it, although still the fork is expensive like they mention. There is clone with some flags, although that does not really solve it.

I think one problem is that it is already how it is; making an entirely new operating system (that is not Linux, not GNU, and not POSIX) would solve it, but that is not the case here, so it would need to be done as it is.

One possibility would be a new function that creates a new empty child process, but the parent process specifies what system calls the child process executes, and can stop if specifying that exec or exit is (successfully) called by the child process, or if the parent process gives it the program memory to execute directly instead of using a file (since that use is also useful). The new function can still have some of the clone flags available. (I don't actually know how much better it would work.)

There are other possibilities as well.

The existing methods can also remain available for when they are helpful, but functions such as popen might be changed to use the new method.

It should be spawn, configure, exec. Configure can be done if the process starts with a ptrace attachment and no threads, so you can force it to do syscalls. Linux doesn't even have a concept of "process with no threads", so it'd probably have to have a dummy thread.

I agree. I think the current way is very nice to use (in c). I think the best way would be to have something similar to vfork() but not bound by posix rules. Then make the normal posix apis (close, setuid, etc.) act like the Rust “builder” pattern. Possibly giving them a prefix for explicitness. That way the “fill out a giant structure” people could have their wish and the people that just want a faster posix experience don’t have to learn an entirely new concept and api surface. It would be future extensible that way, too (just add more prefixed calls to the builder).

The flip side of this is that you have to be aware of the entire state of the process, including everything done in libraries, in order to correctly start a new process.

Quick, what's the highest numbered open file descriptor in the your program?

This gets even worse if you have multiple threads running. Without looking it up, what is the state of all the various synchronization primitives in a forked process?

The new system calls described in the article have an extensible declarative command interface built into them to do things like close or duplicate file descriptors. Not opposed to it but it definitely stood out to me.

That's mostly papering over design mistake that most syscalls doesn't accept target pid. Otherwise you could just create suspended process, configure it with syscalls that explicitly take target pid, and start it.

  • Maybe, I am not saying fork() + exec() model couldn't be improved, but most people saying it is "terrible" and it needs to die seem to go on to propose something substantially worse.