Comment by chubot

4 years ago

FWIW this is the same reason you can't implement implement a portable Unix shell in portable Go. (And similar issues with an init daemon)

Go only exports os.ForkExec() -- there is no os.Fork() or os.Exec(), because the things you can do between the calls could break Go's threaded runtime. (Goroutines are implemented with OS threads.)

Some elaboration on that: https://lobste.rs/s/hj3np3/mvdan_sh_posix_shell_go#c_qszuer

That is, the space between fork and exec is where pipelines are implemented, but also entire subinterpreters/subshells. The shell actually uses copy-on-write usefully. (And yes I'm aware that there's a good argument that the shell is almost the ONLY program that needs fork() !)

----

A lot of people have asked me why not implement Oil in Go and various other languages, so I wrote this page:

https://github.com/oilshell/oil/wiki/FAQ:-Why-Not-Write-Oil-...

So the funny thing is that Python is a lower level language than Go for this particular problem. It doesn't do anything weird with regard to syscalls. I'm still looking for help on this (and donations to pay people other than me):

Oil Is Being Implemented "Middle Out" https://www.oilshell.org/blog/2022/03/middle-out.html

I think this turns out to be a tangent, but at least superficially it is possible for a C program to "do" shell pipelines without use of fork or vfork (directly) but rather by posix_spawn. I suppose "portable go" does not directly wrap posix_spawn so this option may not be on the table for you.

Basics: https://gist.github.com/ec8469273c7808d46c7285cd056d0104

Typical use: `./a.out seq 3 2 9 -- cat -n` is similar to `seq 3 2 9 | cat -n` except that the return value is nonzero if either side's return value is nonzero.

that said, I wouldn't be surprised if there's something important I'm overlooking here.

> The shell actually uses copy-on-write usefully. (And yes I'm aware that there's a good argument that the shell is almost the ONLY program that needs fork() !)

It's been a while since I looked at it, but I believe Android uses fork for it's copy-on-write sementics to optimize app startup. On boot it initializes a single instance of the app runtime environment. Then when you launch apps that initial process is forked. As a result you do not need to reinitialize the runtime for every app launch.

  • This is moderately common for environments where you are pushing a lot of startup work into the dynamic linker and will be launching processes frequently. Loading shared libraries for example.

    You have a parent process which uses dlopen() to load all the libraries you want to avoid re-linking. When you want to spawn a child, rather than exec() you dlopen() an object with your child's main() and call it. For the case where you have enough libraries this is much faster than an exec(), saving tens of seconds on every application launch if you have a really bad case of C++.

    There some small surprises which become obvious with a little thought. You are responsible for everything that normally happens in your process before main() is called. ASLR is only done once per session. People rarely think to fix-up argv[] for ps and friends in the first version.

  • Yes I think the argument is that Android (and Chrome) could use something like vfork or posix_spawn().

    I'm not sure which, if any; I'd like to see an analysis of that... The issue is what kernel state is preserved/shared across the process creation call.

    Every process sort of has a "mirror" in kernel memory. The user memory is CoW, and I suppose you also have to choose whether to copy or reference every kernel data structure as well --- open files in FD tables which point to disk/pipes/sockets, locks which seem to be nonsensical, etc.

    But probably you can get the "warmup" property without the full semantics of fork(). That is the CoW of user memory is a somewhat separate choice from the kernel data structures.

    ----

    As far as the shell .... In the recent linked thread, Ninja uses posix_spawn because it has a simple use of subprocesses: https://news.ycombinator.com/item?id=31743230

I am always thoroughly amazed when people develop programming languages that are intentionally difficult to use.

  • Please don’t post superficial and petty comments. Instead think more deeply about tradeoffs in designing for concurrency