Comment by Solomoriah
7 years ago
Okay, this one has me laughing out loud. Of COURSE Microsoft doesn't like fork()... Windows pretty much can't do it. I'll admit, there have been a lot of times I wish there was a more streamlined way to spawn processes on Linux (particularly daemons) but when I don't have fork() I always end up missing it. I'd take this paper a lot more seriously if it came from someone with a less obvious bias.
The article pointed out legitimate drawbacks related to the intersection of fork() and other features like posix threads.
The paper mentions the benefit of posix_spawn for the fork+exec use case.
I might've seen posix_spawn while skimming a manpage or browsing a change log but this is the first time that I'd actually learned about its purpose.
The article's conclusion isn't "and therefore Linux is bad" btw.
As Linux developer and Windows hater, I agree with Microsoft. fork() is a hack.
Of course, all Windows APIs are terrible, but that doesn't make complaints about fork() any less legitimate. The concept of Establishing empty processes, instead of cloning yourself, is much more sane.
After all, the use of fork() is 99% of the time just to call execve(), and anything done in between is just to clean up the mess from fork(). Having a dedicated way to just create processes in a controlled fashion would have been better there. And, the other 1% is usually cases where pthread should have been used instead.
Cleaning up your own process between fork and exec is hard. Several programs resort to terrible hacks like force-closing everything except file IDs 0,1,2 in a loop. Or they look into their /proc directory to discover whichnfile IDs exist, which is only marginally better. But when your process is a house of cards built on third party libraries with their own minds, there are not a lot of other options.
Use O_CLOEXEC everywhere (even third party libs). It's really annoying, but necessary. Means you need to use accept4(), dup3(), popen with an additional "e" (of course all of that needs to be feature tested, during compilation/runtime).
2 replies →
> Of course, all Windows APIs are terrible, but that doesn't make complaints about fork() any less legitimate. The concept of Establishing empty processes, instead of cloning yourself, is much more sane.
I like the ease with which you can pass resources and data to the forked child from the parent, though. Otherwise I'd have to do a lot of serialiation and deserialization, or use shared memory, or unix sockets to pass fds, all of which also has it's gotchas and is way more complicated and error prone.
But if you pass resources and data form forked child to parent, you are already using shared memory.
And, in this case, it sounds like a thread would do exactly what you want, but without the oddities of fork().
1 reply →
> And, the other 1% is usually cases where pthread should have been used instead.
Ummmm. No. Threads are a much harder API to get right. They can work in this area, but that's not the same as saying they're right for all/most cases in this area.
I think a sizable part of that remaining 1% (if it is that low) are programs that leverage fork as the very powerful right tool for the job. Many of those also happen to be widely-used programs crucial for the operation of web services and large-data-set processing.
> Ummmm. No. Threads are a much harder API to get right.
Ummmm. No. Threading is not a hard API to get right. It's very simple: You get a new executing thread in the same memory space. You can create them whenever you like without any side-effects. Now, don't trample on your memory. Read all you want from anywhere. If you want to write to shared memory, ensure both reads and writes are behind a mutex, or learn about atomics.
Fork(), on the other hand, is much trickier. Sure, you get a cloned memory space so you can trample all you want, but now you have to establish some form of IPC (which might itself end up requiring threading), and if you didn't fork() as the first thing in your process, you end up inheriting all sorts of state that you do not want. Threads and locks, for example, are now in limbo (depending on your unix flavor of choice), and you likely have a bunch of fd's that you did not want.
I cannot really think of any legitimate use-cases for fork() without exec(). There are legitimate use-cases for multi-process designs, but such designs are severely inconvenienced by fork(), as all they wanted to do was to start processes without inheriting state.
I also certainly cannot see any sensible argument for threading being harder than fork(), especially if you're just using it as a drop-in replacement where there will be no shared state after invocation outside of explicitly created communication channels.
3 replies →
ONE of the authors is from Microsoft. The other THREE are at Boston University and ETH.
As the article points out, the NT kernel actually natively supports fork. It's just not exposed.
Well, couldn't. Whatever they're doing with LXSS and picoprocesses seems to be good enough.
I don't run Windows so I'm far from the most biased person but frankly, on the surface the fork/exec thing really does seem unnecessary and weird in the modern world, where we've come up with better ways to do concurrency than just raw threads and processes anyways.
> Windows pretty much can't do it.
Win32 API cannot do it. The underlying NT kernel can.
I thought Linux had clone which glibc called for their implementation of fork.
Yes, the underlying syscall for fork() is clone [1], and the underlying syscall for exec*() is execve [2].
[1]: http://man7.org/linux/man-pages/man2/fork.2.html#NOTES
[2]: http://man7.org/linux/man-pages/man2/execve.2.html
Section 6: REPLACING FORK
> Alternative: clone().
> This syscall underlies all process and thread creation on Linux. Like Plan 9’s rfork() which preceded it, it takes separate flags controlling the child’s kernel state: address space, file descriptor table, namespaces, etc. This avoids one problem of fork: that its behaviour is implicit or undefined for many abstractions. However, for each resource there are two options: either share the resource between parent and child, or else copy it. As a result, clone suffers most of the same problems as fork (§4–5).
Which part of the paper made you laugh out loud?
Their arguments of why fork() is not a good fit these days seemed pretty reasonable to me.
I've been working with Unix systems for a long time. I too dislike fork(), even though I used to think it was the greatest thing. Here's a write-up of mine as to fork() being "evil": https://gist.github.com/nicowilliams/a8a07b0fc75df05f684c23c...