← Back to context

Comment by harry8

4 years ago

Fork with cow is inefficient.

Compared to what? In what dimension? Any numbers on that? Where is the trade-off? To what extent does anyone need to care and on what circumstances?

> Fork with cow is inefficient.

> Compared to what?

vfork()

> Any numbers on that?

I added links to the gist, some of which discuss performance in detail. E.g., https://blog.famzah.net/tag/fork-vfork-popen-clone-performan... and https://bugzilla.redhat.com/show_bug.cgi?id=682922

But you can just reason about this:

  - vfork() is O(1)

  - copying fork() is O(N) where N is the
    amount of writable memory in the parent's
    address space

  - copy-on-write fork() is O(N) where N is
    the resident set size (RSS) of the parent

O(1) beats O(N).

And O(N) is just the complexity of fork() for a single-threaded parent process. Now imagine a very busy, threaded, large-RSS process that forks a lot. You get threads and child processes stepping all over each other's CoW mappings, causing lots of page faults and copies. Ok, that is still O(N), but users will feel the added pain of all those page faults and TLB shootdowns.

  • Ok but you're just repeating "It's inefficient" and not saying in any way for what use is its inefficiency even noticeable. I want to reason about when I would care. You see?

    The first link didn't even have units on its numbers(!) I assume they're milliseconds. When does that scale become something one would care about at all? Not launching a gui process. Not a shell pipeline. So when is this issue arising at all? What is being done that makes fork inefficiency anything other than academic interest. Must be something, right? Forking webserver?

    • > When does that scale become something one would care about at all? Not launching a gui process. Not a shell pipeline.

      Indeed, in those cases one just does not care about performance.

      Yet there are cases where one does. Imagine an orchestration system written in Java -- with lots of threads (perhaps because it might be a thread-per-client affair, or maybe just NCPU threads), with a large heap (because Java), and launching lots of small tasks as external programs. Maybe those tasks are ssh commands (ok, sure, today you could use an SSH library in Java) or build jobs (maybe your app is a CI/CD orchestrator). Now launching external jobs is the core of what this does, and now the cost of fork() bites.

      4 replies →

It's inherently inefficient because while the child process does its initialization (pre-exec) stuff, the parent gets page faults for every thread writing into the memory due to COW. This will basically stall the parent and can cause funny issues.