Comment by ignoramous
3 years ago
On a related note, I found ART's implementation of Concurrent copying and compaction GC to be pretty novel. Here's a nice write up detailing how handling page faults in userspace come in handy to accomplish that: https://www.tdcommons.org/cgi/viewcontent.cgi?article=4751&c... (pdf) / https://web.archive.org/web/20230216233459/https://www.tdcom...
For context, here's a brief overview of the evolution of the Android Runtime Garbage Collector: https://archive.is/Ue6Pj
Certainly interesting, but there are no performance numbers mentioned in the white paper comparing userfaultfd to read barriers. So the actual benefit to switching to userfaultfd is unknown (at least in publically accessible documents -- I'm sure Google has done internal performance evaluations).
Using page faults (and/or page protection) to perform compaction instead of barriers is a pretty old technique (see the 1988 paper by Appel, Ellis, and Li [1]; see also the Compressor by Kermany and Petrank [2]). But handling page faults were very expensive (at least on Linux) until the addition of userfaultfd recently.
[1]: https://dl.acm.org/doi/10.1145/960116.53992 [2]: https://dl.acm.org/doi/10.1145/1133255.1134023
What is the actual perf benefit of userfaultd (e.g. for write protection)? It sounds interesting but its unclear to me why it would be any faster than a signal handler. Is it just a simpler code path within the kernel? Or is it that the hardware is configured to directly jump to a user space handler without any kernel intervention?
Well page protection is expensive which is why the predominant way to implement concurrent copying/compaction until recently was using read barriers (and still is -- ART seems to be an exception). I will mention that concurrent compacting GCs wouldn't use userfaultfd for write protection, but for read protection (it effectively takes over the job of a read barrier).
I personally don't know too much about userfaultfd and how it works internally, but my best guess is that it bypasses heavyweight kernel data-structures and lets the user application handle the page fault. It is obviously better than simply using mprotect, but it is not immediately clear why it would be better than a read barrier (other than code size considerations, which honestly doesn't sound like much of a big deal as the code handling userfaultfd also needs to be brought into the instruction cache).
I did find this kernel doc about userfaultfd [1] which might be interesting to read if you're interested (it also does mention that userfaultfd doesn't lock some internal kernel data-structures which gives it a better performance than simply using mprotect, but the implementation details are a bit sparse).
[1]: https://docs.kernel.org/admin-guide/mm/userfaultfd.html