Comment by thayne

1 day ago

Another possible design is instead of forking the current process, you create a new empty process, then the parent calls syscalls to set up the new process, and eventually call exec on the child process. That does mean you either need new syscalls for that, or adapt existing syscalls to take a pidfd as an argument. That also solves some other problems with fork/exec where the default is to inherit a lot of things you probably don't want. With this, you can opt in to inheritance instead of having to opt out.

Or you could create a hybrid between a thread and a process, where it still uses the parent's memory space (unlike fok), but has it's own stack (unlike vfork), and is in its own process (unlike a thread). I think this is technically possible on linux, but there isn't a readily available interface for it. Although it seems like posix_spawn could be implemented that way...

> you create a new empty process, then the parent calls syscalls to set up the new process ...

That does seem like a much better design to me. But I wonder if that was considered way back at the dawn of computing and rejected for good reason?

> I think this is technically possible on linux, but there isn't a readily available interface for it.

Yes there is, see `man clone`. POSIX and glibc are quite different from the kernel in this regard. AFAIK under linux there are just threads of execution that might or might not share various namespaces and memory mappings. That said, the kernel does place a few artificial restrictions on what combinations are allowed in order to (as I understand it) guard against the unintended exercise of entirely untested combinations that serve no known practical purpose.

The practical problem is that if you start doing as you please with the various namespaces and mappings you quickly become incompatible with glibc and by extension most likely the majority of the dynamic libraries available on your system.

Syscalls aren’t all that cheap either.

  • io_uring taught us that if syscalls are expensive, queue them up in a buffer with one syscall to transfer the thread to the os to process it. So, queue up the new process mutations in a buffer with a single syscall to process all of them in a batch. This model should have replaced repetitive syscalls across the kernel years ago.

  • This true, but these methods don't increase the number of syscalls you need to make.