Comment by pizlonator

1 day ago

Fork is marvelous for the zygote pattern

Hard to come up with an optimization that is equally efficient and elegant

The zygote pattern[1] is a great optimization to deal with the cost of forking, but IMHO, being able to inexpensively spawn a carefully tailored process regardless of the size and scope of the current process would be better.

I would guess it would be a small difference in measurable performance between zygote and a direct clean spawn, but it's one less trick an application needs to do, and it would be very helpful for libraries that spawn things. Spawning inside a library isn't always a great thing to do, but some things would really benefit from process level isolation.

[1] In case one isn't aware, the zygote pattern involves forking a 'zygote' process during application startup, and having that process do any forks that need to happen during application runtime. This reduces the cost of forking in large applications, because the zygote will have few fds open and use little memory. This lets your large application spawn new processes without delaying the application or the startup of the new processes. Some applications will spawn many zygotes to allow parallelism for spawning at runtime.

  • You're referring to something else, and maybe I'm using the term "zygote" incorrectly.

    In all uses of zygotes that I have seen, here's what's really happening:

    - `fork` is being used to reduce the cost of starting a process that has a high start-up cost. So, you start one process, run it through the expensive initialization, and then fork it from there to start new processes.

    - To make this even faster, you have a pool of pre-forked processes sit around.

    - Having pre-forked processes sitting around ready to be used is not expensive because of the CoW property and the fact that a process that forks and then immediately pauses will not have triggered any significant CoW yet.

    So, the zygote optimization you speak of is in practice only meaningful on top of systems that are using an optimization uniquely enabled by `fork` (avoiding process initialization costs by cloning a process), and that zygote optimization is further optimized by another property of `fork` (memory sharing of forked processes that haven't done anything else yet).

    • Oh I see. I guess your zygotes have developed more than mine. I think Google may have coined or at least popularized the term zygote for this in Chrome and Android, Chrome documentation [1] says:

      > A zygote process is one that listens for spawn requests from a main process and forks itself in response. Generally they are used because forking a process after some expensive setup has been performed can save time and share extra memory pages.

      I think reading the first sentance and stopping covers my zygote, but adding the second sentance covers yours. So I think we're both right!

      I think both paths are useful. If your children need time to startup and become ready, spawn one that does start up work, and then it (pre)forks at the ready state to have processes ready to handle requests (your zygote). This does require a traditional fork() to avoid duplication of work.

      But if forking is expensive at runtime because you have a million FDs open and a whole lot of memory allocations, spawn spawners before you start doing work (my zygote). This could be unnecessary with a inexpensive way to spawn a new process from an process that has lots of resources in use.

      Of course, you can also use my zygotes to spawn your zygotes. Zygoteception.

      [1] https://chromium.googlesource.com/chromium/src/+/HEAD/docs/l...

      2 replies →

  • > being able to inexpensively spawn a carefully tailored process regardless of the size and scope of the current process would be better.

    It's called clone(2)

    • adding on the the sibling, what argument to clone allows me to set the fds of the child? AFAIK, you either share the FD table with the parent, or get a copy of it. If the parent has 1 million FDs open and the child doesn't want most of those, dealing with that has real costs. Many applications that tend to have large numbers of FDs and also fork/exec will mitigate the cost by spawning a process during startup that they can then use to spawn processes during runtime without doing it from the main process; this is a nice mitigation, but it shows a missing interface.

The paper explicitly covers it that various memory COW/snapshot mechanisms are probably faster and safer than the zygote pattern. As it stands getting the zygote pattern correct and safe is something you have to plan for upfront. You can’t retrofit it which is why the paper mentions it has poor composability. Also the advantages of the zygote pattern can be overstated since the memory sharing benefit is minimal since it has to happen so early and modern OSes already transparently CoW duplicate pages in the background.

  • In what sense can you not retrofit the zygote pattern?

    • I recommend at least skimming the paper as it covers this. But essentially you can’t just inject a call at a random point in code to start being a zygote. It’s something you have to plan up front as to the exact point you’re going to fork and that you’re going to do it at the start of program before any threads have started or any files are open and before any locks have been acquired. It’s basically all the challenges of invoking fork at arbitrary points in time.

      The reason to do a zygote in the first place could be solved with alternative special APIs that are safer and harder to misuse. But we have fork so there’s not as big of a demand despite the warts.

And so easy to make into bottleneck.

Yes, zygote pattern makes it easy to make fork() into bottleneck - it requires a lot more discipline and low level tricks (linker scripts, compiler-specific extensions, custom sections, low level dependencies on pagesize that get "fun" on ARM servers).

If you don't, you might wake up with fork() causing latency issues.

Unless you want to create a thread in your zygote. Then it breaks down.

Raw fork() is terrible. Instead we need a proper primitive to stop and make a snapshot of a process.

  • You can create threads in the zygote. It doesn't "break down", but sure, there's a bit more work.

    My trick for that is that the set of threads that I create pre fork have to be suspendable and resumable, preferably lazily (they resume when they are actually needed). So, the zygotes are sitting with those threads suspended. When they become active, they can do work immediately. They might lazily resume those threads as needed.

    There are other idioms for this too.

    > Raw fork() is terrible. Instead we need a proper primitive to stop and make a snapshot of a process.

    Folks have been saying that it's terrible for as long as I can remember. But it's still there, because it's better than the alternatives