← Back to context

Comment by mark_undoio

4 years ago

The code I currently work on actually has a use of `clone` with the `CLONE_VM` flag to create something that isn't a thread. Since `CLONE_VM` will share the entire address space with the child (you know, like a thread does!) a very reasonable response would be "WAT?!"

What led us here was a need to create an additional thread within an existing process's address space but in a way that was non-disruptive - to the rest of the process it shouldn't really appear to exist.

We achieved this by using `CLONE_VM` (and a handful of other flags) to give the new "thread-like" entity access to the whole address space. But, we omitted `CLONE_THREAD`, as if we were making a new process. The new "thread-like" entity would not technically be part of the same thread group but would live in the same address space.

We also used two chained `clone()` calls (with the intermediate exiting, like when you daemonise) so that the new "thread-like" wouldn't be a child of the original process.

All this existed before I joined, it's just really cool that it works. I've never encountered a such a non-standard use of clone before but it was the right tool for this particular job!

> What led us here was a need to create an additional thread within an existing process's address space but in a way that was non-disruptive - to the rest of the process it shouldn't really appear to exist.

I'm curious to hear more. What's its purpose?

  • > I'm curious to hear more. What's its purpose?

    Sure! I'll try to illustrate the general idea, though I'm taking liberties with a few of the details to keep things simple(r).

    Our software (see https://undo.io) does record and replay (including the full set of Time Travel Debug stuff - executing backwards, etc) of Linux processes. Conceptually that's similar to `rr` (see https://rr-project.org/) - the differences probably aren't relevant here.

    We're using `ptrace` as part of monitoring process behaviour (we also have in-process instrumentation). This reflects our origins in building a debugger - but it's also because `ptrace` is just very powerful for monitoring a process / thread. It is a very challenging API to work with, though.

    One feature / quirk of `ptrace` is that you can't really do anything useful with a traced thread that's currently running - including peeking its memory. So if a program we're recording is just getting along with its day we can't just examine it whenever we want.

    First choice is just to avoid messing with the process but sometimes we really do need to interact with it. We could just interrupt a thread, use `ptrace` to examine it, then start it up again. But there's a problem - in the corners of Linux kernel behaviour there's a risk that this will have a program-visible side effect. Specifically, you might cause a syscall restart not to happen.

    So when we're recording a real process we need something that:

    * acts like a thread in the process - so we can peek / poke its memory, etc via ptrace * is always in a known, quiescent state - so that we can use ptrace on it whenever we want * doesn't impact the behaviour of the process it's "in" - so we don't affect the process we're trying to record * doesn't cause SIGCHLD to be sent to the process we're recording when it does stuff - so we don't affect the process we're trying to record

    Our solution is double clone + magic flags. There are other points in the solution space (manage without, handle the syscall restarting problem, ...) but this seems to be a pretty good tradeoff.

    [edit: fixed a typo]

    • I looked into something similar for implementing a concurrent GC. I ended up just using mmap() and ptrace() since I did have to manipulate the process for certain barrier operations; I probably could have done it with non-ptrace system calls; there are tradeoffs to be made (either way you need to interrupt any pending systemcalls, but there are multiple ways of doing that).

    • The problem record and replay is expansions of languages and apis too. That is a good thing for some things but it needs to be reworded sometimes too and implementations of things aren't always newer versions of things either.

      1 reply →