Comment by adrian_b
7 hours ago
The C library maintains its own set of file descriptors, which are mapped to the OS file descriptors (because the stdio file descriptors and the OS file descriptors have different types and different behaviors).
I do not know whether this is true, but perhaps the previous poster means that using clone3 with certain arguments may break this file descriptor mapping so invoking after that stdio functions may have unexpected results.
Also the state kept by the libc malloc may get confused after certain invocations of clone3, because it has memory pages that have been obtained through mmap or sbrk and which may sometimes be returned to the OS.
So libc certainly cares about the OS file descriptors and virtual memory mappings, because it maintains its own internal state, which has references to the corresponding OS state. I have not looked to see when an incorrect state can result after a clone3, but it is plausible that such cases may exist, so that glibc allows calling clone3 only with a restricted combination of arguments and it does not provide a wrapper that would allow other combinations of arguments.
Yes; this is why QEMU's user-space-emulation clone syscall handling restricts the caller to only those combinations of clone flags which match either "looks like fork()" or "looks like creating a new pthread", because QEMU itself is linked with the host libc and weird clone flag combinations will put the new process/thread into a state the libc isn't expecting.
All fair points. What do other languages' standard libraries do to walk around clone3 then? If two threads share file descriptors but not virtual memory, do they perform some kind of IPC to lock them for synchronizing reads and writes?
> What do other languages' standard libraries do to walk around clone3 then?
They don't offer generic clone3 wrappers either AFAIK. All the code I've seen that uses it - and a lot of it is not in standard libraries but in e.g. container runtime implementations - has its own special-purpose code around a specific way to call it.
My point is not that other standard libraries do it better, but that clone3 as a syscall interface is highly versatile, moreso than it could be as a function in either C or most other languages. That is, the syscall API is the right layer for this feature to be.