← Back to context

Comment by kevincox

4 years ago

TL;DR if another thread is holding a lock when you fork that lock will be stuck locked in the child, but that thread that was using that lock no longer exists.

So if your multi-threaded program uses malloc you may fork while a global allocation lock is being held and you won't be able to use malloc or free in the child (thread-local caches aside).

There are other problems but this is the basic idea. To be fork-safe you need to allow any thread to just disappear (or halt forever) at any point in your program.

malloc has to guard its locks against fork, probably using pthread_atfork, or some lower level internal API related to that.

The problem with pthread_atfork is third party libs.

YOU will use it in YOUR code. The C library will correctly use it in its code. But you have no assurance that any other libraries are doing the right things with their locks.

  • Your "third party libs" includes system libraries like libdl.

    We had a Python process using both threads (for stuff like background downloads, where the GIL doesn't hurt) and multiprocessing (for CPU-intensive work), and found that on Linux, the child process sometimes deadlocks in libdl (which Python uses to import extension modules).

    The fix was to use `multiprocessing.set_start_method('spawn')` so that Python doesn't use fork().

  • Also if, for any reason, you end up doing a `fork()` syscall directly rather than via libc you'll still have a problem as appropriate cleanup won't happen.

    Of course, the best answer to that is usually going to be "don't do that"!

  • > But you have no assurance that any other libraries are doing the right things with their locks.

    I mean, if they're broken, fix them or get upstream to fix them.

  • The more stuff that piles on using pthread_atfork then also contribute to fork() being unnecessarily slow for the specific combination of fork+exec.

    • Right, and so POSIX "fixed" that by standardizing posix_spawn. Thus fork is now mainly for those scenarios in which exec is not called, plus traditional coding that is portable to old systems.