Comment by mrob

4 years ago

Fork() is the second worst idea in programming, behind null pointers. Fork() is the reason overcommit exists, which is the reason my web browser crashes if I open too many tabs, and the reason the "safe" Rust programming language leaves software vulnerable to DOS attacks if it uses the standard library. It's a clear example of "worse is worse", and we should have switched to the Microsoft Windows model decades ago.

Here's a paper from Microsoft Research supporting this point of view:

https://www.microsoft.com/en-us/research/uploads/prod/2019/0...

> the reason the "safe" Rust programming language leaves software vulnerable to DOS attacks if it uses the standard library

Linux overcommitment is often cited as an argument for the "panic on OOM" design of the allocating parts of the Rust standard library, and it's an important part of the story. But I think even if the Linux defaults were different, Rust would still have gone with the same design. For example, here's Herb Sutter (who works for Microsoft) arguing that C++ would benefit from aborting on allocation failure: https://youtu.be/ARYP83yNAWk?t=3510. The argument is that the vast majority of allocations in the vast majority of programs don't have any reasonable options for handling an alloc failure besides aborting. For languages like C++ and Rust, which want to support large, high-level applications in addition to low-level stuff, making programmers litter their code with explicit aborts next to every allocation would be really painful.

I think it's very interesting that Zig has gone the opposite direction. It could be that writing big applications with lots of allocs ends up feelign cumbersome in Zig, or it could be that they bend the curve. Fingers crossed.

Why overcommit is a problem? A program is unlikely to use all the memory that it allocates, or use it only at a later time. It would be a waste to not have it, it would mean having a ton of RAM that never gets used because a lot of programs allocates more ram that they will probably ever need. And it would be inefficient, costly and error prone to use dynamic memory allocation for everything.

The cause of your browser crash is not the overcommit, is simply the fact that you have not enough memory. If you disable overcommit (something you can do on Linux) you would the same crash earlier, before you allocated (not necessary used) 100% of your RAM (because really no software handles the dynamic memory fail condition, i.e. malloc returning null, that you can't handle reasonably).

Null pointers are not a mistake, how do you signal the absence of a value otherwise? How do you signal the failure of a function that returns a pointer without having to return a struct with a pointer and an error code (which is inefficient since the return value doesn't fit a single register)? null makes a perfect sense to be used as a value to signal "this pointer doesn't point to something valid".

Microsoft saying that fork() was a mistake... well, of course, because Windows doesn't have it. fork was a good idea and that is the reason why it's still used these days. Of course nowadays there are evolution, in Linux there is the clone system call (fork is deprecated and still there for compatibility reasons, the glibc fork is implemented with the clone system call). But the concept of creating a process by cloning the resources of the parent is something that to me always seamed very elegant to me.

In reality fork is something that (if I remember correctly, I don't have that much experience in programming in Windows) doesn't exist on Windows, and the only way to create a new process of the same program is to launch the executable, and pass the parameters from the command line, that is not that great for efficiency at all, and also can have its problems (for example the executable was deleted, renamed, etc while the program was running). Also in Windows there is neither the concept of exec, tough I think it can be emulated in software (while fork can't).

To me it makes perfect sense to separate the concept of creating a new process (fork/clone) and loading an executable from disk (exec). It gives a lot of flexibility, at a cost that is not that high (and there are alternatives to avoid it, such as vfork or variations of the clone system call, or directly higher level API such as posix_spawn).

  • I think much of the confusion around nulls stems from the fact that in mainstream languages pointers are overloaded for two purposes: for passing values by reference, and for optionality.

    Nearly every pointer bug is caused by the programmer wanting one of these two properties, and not considering the consequences of the other.

    Non-nullable references and pass-by-value optionals can replace many usages of pointers.

    • Yes, and they are just two usages of pointers. The fact is that, whatever you call it, null pointer, nullable reference, optional, you have to put in a language a concept of "reference to an object that can reference a non valid object".

      1 reply →

  • >How do you signal the failure of a function that returns a pointer without having to return a struct with a pointer and an error code (which is inefficient since the return value doesn't fit a single register)?

    Rust does this with the Result and Option "enums", which are internally implemented as tagged unions. From my understanding the only overhead with this implementation is the size taken by the tag and then any padding required for alignment.

    It also helps that references in Rust are not nullable and working with pointers is fairly rare, so the type system can do a lot of heavy lifting for you rather than putting null checks all over the place. When you have &T you never have to worry about handling null in the first place!

  • >Null pointers are not a mistake

    The inventor, Tony Hoare, famously called them his "billion-dollar mistake". The better way to do it is with nullable types (which could internally represent null as 0 as a performance optimization). This is something Rust gets right.

    • Nullable types... they have the same problems as null pointers: if you don't care about handling the case they are null the program will crash, if you handle it, you can handle it also for null pointers. Well, they have a nicer syntax, and that's it. How much Rust code is full of `.unwrap()` because programmers are lazy and don't want to check each optional to see if it's valid? Or simply don't care about it, since having the program crash on an unexpected condition is not the end of the world.

      2 replies →

Interesting take. If you don't mind explaining, what is the MS Windows model in in this context?

  • Windows doesn't have fork as you know it. It has a POSIX-ish fork-alike for compliance, but under the hood it's CreateThread[0] with some Magic.

    in Windows, you create the thread with CreateThread, then are passed back a handle to that thread. You then can query the state of the thread using GetExitCodeThread[1] or if you need to wait for the thread to finish, you call WaitForSingleObject [2] with an Infinite timeout

    Aside: WaitForSingleObject is how you track a bunch of stuff: semaphores, mutexes, processes, events, timers, etc.

    The flipside of this is that Windows processes are buckets of handles: a Process object maintains a series of handles to (threads, files, sockets, WMI meters, etc), one of which happens to be the main thread. Once the main thread exits, the system goes back and cleans up (as it can) the rest of the threads. This is why sometimes you can get zombie'd processes holding onto a stuck thread.

    This is also how it's a very cheap operation to interrogate what's going on in a process ala Process Explorer.

    If I had to describe the difference between Windows and Linux at a process model level, I have to back up to the fundamental difference between the Linux and Windows programming models: Linux is is a kernel that has to hide its inner workings for its safety and security, passing wrapped versions of structures back and forth through the kernel-userspace boundary; Windows is a kernel that considers each portion of its core separated, isolated through ACLs, and where a handle to something can be passed around without worry. The windows ABI has been so fundamentally stable over 30 years now because so much of it is built around controlling object handles (which are allowed to change under the hood) rather than manipulation of of kernel primitives through syscalls.

    Early WinNT was very restrictive and eased up a bit as development continued so that win9x software would run on it under the VDM. Since then, most windows software insecurities are the result of people making assumptions about what will or won't happen with a particular object's ACL.

    There's a great overview of windows programming over at [3]. It covers primarily Win32, but gets into the NT kernel primitives and how it works.

    A lot of work has gone into making Windows an object-oriented kernel; where Linux has been looking at C11 as a "next step" and considering if Rust makes sense as a kernel component, Windows likely has leftovers of Midori and Singularity [4] lingering in it that have gone onto be used for core functionality where it makes sense.

    [0] https://docs.microsoft.com/en-us/windows/win32/api/processth... [1] https://docs.microsoft.com/en-us/windows/win32/api/processth... [2] https://docs.microsoft.com/en-us/windows/win32/api/synchapi/... [3] https://www.tenouk.com/cnwin32tutorials.html [4] https://www.microsoft.com/en-us/research/project/singularity...

Overcommits exist any time you can have a debugger anyways.

fork() was a brilliant way to make Unix development easy in the 70s: it made it trivial move a lot of development activity out of the kernel and into user-land.

But with it came problems that only became apparent much later.

unpopular opinion: null pointers (in at least java and c) are the single greatest metaphor in software development, and are the CS analog to the invention of zero

There was an article about exceptions the other day that lamented that exceptions are high latency because the exceptional path will be paged out. I would assume overcommit is to blame for that too.

  • That's probably a caching issue, and caching issues are a fact of life for the foreseeable future. (Could also be a disk swap issue, but probably not.)

  • Why would you assume that..?

    • Well it's Linux's whole memory philosophy really. That you ask for data storage that may or may not be memory. This ties in with overcommit, because if you promise more memory than you have then you need a contingency plan. And that means flushing caches, it means swapping data to disk and it means erasing executable code (it is file backed, so it can just be read back in).

      This fuzziness of what is and isn't in memory, is why stuff that is rarely needed needs to hit disk meaning a latency spike.