Comment by dexen

7 years ago

>all the kernel page tables still need to be copied and for a multi-GB process that's nontrivial

Only in the pathological case where the large process is backed solely by the 4kb pages. The hardware has long now supported large pages - on x86 since Pentium Pro, if memory serves - and huge pages. The popular OSes (Linux 2.6+ and Windows 2003+) also do support large and huge pages. A 2GB process can easily be three pages: r/x code, r/w stack, r/w data (2gb). Granted, it gets a bit more complex if mmapped I/O or JIT are used, but since both are mature technology now, it's fine to point fingers at any inefficiency and demand better. Another caveat would probably be shared libraries loading at separate address ranges, which, IMO, is another reason to ditch shared libraries for good.

Contrary to popular wisdom, OS research is still relevant.

You want to ditch shared libraries and mmap to map your big processes using GB pages to make fork fast again (despite it not being the main and only drawback)???

OS research might be relevant, and it's good that some people have wild idea, but honestly I doubt this one will go anywhere :P

  • Ah sorry, only want to ditch the shared libraries; added mmap is a later edition didn't realize it's unclear. Of course mmap is necessary.

    • About shared libraries, I know that there is this line of thought considering them "evil" (well at least sufficiently to want to get rid of them); but I'm quite unsure about what a modern system would look like without them (although this is less a problem at the application level on e.g. Android, the system level is still extremely important)

      With Spectre, proper process bounds (well, address spaces) are more important than ever -- and oh well even without that I'd still have cited them as incredibly important, in the sense that I'd rather have more than fewer. Given that, code reuse involves shared libraries, for several good reasons; the obvious one being not wasting RAM, but then there is the update problem (how to patch programs when security holes are discovered, especially if multiple parties are involved), and on top of that there is the cache pollution problem, which is related to the code duplication problem, and which is quite insidious because it is probably simultaneously hard to benchmark and very real (ambient loss of perf, just not in very hot paths, but this will still have an impact on the general perf of a system, quite like Spectre mitigations are having a big impact)

      Now we could like address space boundaries so much that we would want to just use even MORE processes in place of shared libraries, but this obviously does not work for all services (and Spectre is biting us again because context switches are not cheap), plus if you take it to the extreme this makes systems extremely hard to design, and even bigger. This is part of the reasons we are using Linux instead of Hurd... (well Linux is too much in the opposite direction, but there are hopes that it will in the long term evolve toward a middle ground)

      And anyway that does not fit the narrative at all of using more huge pages.

      Now there are the usual radical ideas about how everything should be running on some kind of VM (sometimes even including the kernel), drastically reducing the amount of "native" code; but given the reality of our current systems that "everything" both relies on multiple VMs and I doubt it will tend to only one, nor should it (because of the monoculture this would induce). Plus the ambient perfs are still lower than native code, and TBH I don't expect that to change ever.

      So, why and how would you like to get rid of shared libraries?

      2 replies →

> Contrary to popular wisdom, OS research is still relevant.

Is it really popular wisdom though, or is it the opinion of one person and it got hyped up, much like the same hype happened on a subpar programming language that same person worked on?