← Back to context

Comment by afiori

7 years ago

as a function obviously, the point is that it does not compose easily with other abstractions. That is every other library and OS functionality needs to be fork-aware.

spawn do not have this requirement.

The concept of "fork aware" didn't exist until threads. You could argue it's a thread problem. Remember, every library and OS functionality aso needs to be "thread aware" when threads are introduced. The pthread_atfork function can be thought about as "what do we do about thread and thread paraphernalia when we fork" rather than "what do we do about fork when we have threads".

Even the close-on-exec flag race condition is a result of threads. duplicating a file descriptor and setting its close-on-exec flag is a two step process during which a fork can happen, causing a child to inherit the descriptor without close-on-exec flag being yet set. But that can only happen if there are threads. (Or something crazy, like fork being called out of an async signal handler).

  • > You could argue it's a thread problem

    But I explicitly want to not do it :) thread are obviously a good thing to have.

    > every library and OS functionality aso needs to be "thread aware"

    which is good, because differently from the case with fork thread aware libraries/OS help performance. Fork aware libraries/OS (in the case fork+exec) do not.

    • "Fork aware" is "thread aware". Hint: see the "pthread" substring in the identifier "pthread_atfork".

      Note that this is necessary only because of the broken threading model that was retrofitted into Unix.

      How it should work is that fork should clone the threads also. If a process with 17 threads forks, then the child has 17 threads. The thread IDs should be internal, so that all the pthread_t values in the parent space make sense in the child space and refer to the corresponding threads.

      It's not fork's fault that the hacky thread design broke it. Fork is supposed to make a faithful replica of a process; of course if that principle is ignored in a major way (like, oops, where are the parent's threads?) then things are less than copacetic.

      Threads also break the concept of a current working directory. If one thread makes a relative path access and another calls chdir, the result is a race condition.

      Threads also break signals quite substantially; the integration of signal handling with threads is a mess.

      Threads are not inherently a good thing to have; they are idiotic, in fact. Fork provides a disciplined form of threading that eliminates problems from the mutation of shared state, and provides fault isolation. It's much better to use forked processes instead of threads. Shared memory can be used for direct data structure access. With fork, you can create a shared anonymous mmap. This is then cloned into child processes as shared memory at the same virtual address.