← Back to context

Comment by ajkjk

1 day ago

Fork always seemed conceptually terrible even when I first learned about it.. If you want to do one thing (start a process) you should not have to use a mysterious incantation that does a different unrelated thing (forks your process) in order to do it.

I am curious about what the best way to handle the example in the article of one process spawning many git subprocesses is. Surely it just doesn't make sense to repeatedly start git from scratch in the course of a long-running parent operation. What's the low cost abstraction for the same result, though?

Yeah, as someone who originally came from Windows, the fork+exec model never made sense to me. Now I know it's just a historical quirk, but for some reason there are still people who pretend that fork+exec is actually a good thing...

Fork is conceptually simple. Without bringing in any other layers, you start a process with the one thing known to exist: yourself.

Otherwise you need multiple steps to create a process, fill it with something to run, and arrange for it to execute. Or like Win32 you permanently smush them together with other layers, like filesystems and object loaders and linkers.

  • Fill with what stuff exactly?

    The only thing I want to inherit from the parent process is its cwd and environment variables, even those are often overridden. The rest can easily be passed explicitly through other channels like pipes or command line arguments.

    Back to the example from the article. It makes no sense that a git-subprocess forked from a web server need to have any process state inherited from the web server.

    • > Fill with what stuff exactly?

      Yes, exactly. Cloning, as a process creation primitive, is the one thing that doesn't need to be concerned with other stuff.

      > … a git-subprocess forked from a web server …

      That's pulling in a whole load of assumptions that are distinct from process creation. You can have processes in an environment that has no concept of file system or persistent storage at all.

  • I gues that way of thinking makes sense if you have a certain model of what a process is, in terms of the data structures and runtime state etc. But, tbh, I think of processes as glorified function calls, which happen to have that stuff involved as an implementation detail. And if spawning a process call is supposed to act like a function call, then of course it should not inherit state. You should call the function you want to call, not call yourself with an instruction to switch over to it instead.

  • It's not conceptually simple. No other object creation API works by copying an existing thing and then modifying it. You don't create a new file by copying an existing one and then modifying it. You don't create a new window by copying an existing one and modifying it.

    Attempting to justify clone/exec as a reasonable design is just Stockholm syndrome.

    • > No other object creation API works by copying an existing thing and then modifying it.

      Clone-and-modify is pretty common in CAD.

      > You don't create a new file by copying an existing one and then modifying it.

      Clone-and-modify is almost universal in version control systems.

      1 reply →

libgit2 exists. You could imagine communicating with some gitd over a pipe/socket but I don't know why that would be a good idea. Short of that you have to spawn processes.

  • On Windows maybe it would be a COM server, using IPC built into the OS. The client sees it like a local function call.