← Back to context

Comment by cryptonector

4 years ago

> vfork(2) is an abomination. Even when the child returns, the parent now has a heavily modified stack if the child didn't immediately exec().

What stack modifications? Sure, the child can scribble over the stack frame, or worse, the child could do things like return -- but you are the author of the code calling vfork() and you know not to do that, so why would that happen?

A: It just wouldn't happen.

And as to exec() failing, this is why exec calls must be followed with calls to either exec() or _exit(), and this is true even if you use fork() instead of vfork(). I.e.:

    /* do a bunch of pre-vfork() setup */
    ...
    
    pid_t pid = vfork();
    
    if (pid == -1) err(1, "Couldn't vfork()");
    
    if (pid == 0) {
      /* do a bunch of child-side setup */
      execve(...);
      /* oops, ENOENT or something */
      _exit(1);
    }
    
    /* the child either exec'ed or exited */
    if (waitpid(pid, &status, 0) != pid) err(1, "...");
    
    ...

How do you detect if the child exec'ed or exited? Well, you make a pipe before you vfork(), you set its ends to be O_CLOEXEC, then on the child side of vfork() you write one byte into it if the exec call fails. On the parent side you read from the pipe before you reap the child, and if you get EOF then you know the child exec'ed, and if you get one byte then you know the child exited. The one byte could be an errno value.

No, really, what you say about vfork() is lore, and very very wrong.

That said, vfork() blocks a thread in the parent. The point of my gist was to explain why fork() sucks, why vfork() is much better, and what would be better still.

> And I don't see what afork gets you that clone doesn't, except afork isn't as general.

afork()/avfork() is not meant to be as general as clone() but to be more performant than vfork() by not blocking a thread on the parent side.

clone() needs some improvements. It should be possible to create a container additively. See elsewhere in the comments on this post.

> What stack modifications? Sure, the child can scribble over the stack frame, or worse, the child could do things like return -- but you're the author of the code calling vfork() and you know not to do that

Within a sentence you described the stack modification. 'It's not a footgun, just don't make mistakes' doesn't hold a lot of water with me.

> No, really, what you say about vfork() is lore, and very very wrong.

Like I've said elsewhere in the comments, I've literally had to fix awful bugs, some security related, from how much vfork() is a preloaded foot gun with the safety off. Not everyone who has a bad impression of it is just following the "lore".

> afork()/avfork() is not meant to be as general as clone() but to be more performant than vfork() by not blocking a thread on the parent side.

Ok, but I'm not going to hold it against clone for being a more general solution.

> clone() needs some improvements. It should be possible to create a container additively. See elsewhere in the comments on this post.

I agree with this, but there's practical reasons why this isn't the case, mainly around how asking user space for every little thing is expensive, and large sparse structs to copy into kernel space covering basically everything in struct task sounds like a special kind of security hell I would not want to be a part of.

A flag to clone to create an empty process and something like a bunch of io_uring calls or a box program to hydrate the new task state would be really neat, and has been kicked around a bunch. There's just a ton corner cases that haven't been ironed out.

  • > 'It's not a footgun, just don't make mistakes.'

    fork() -> fork bombs -> fork() is a footgun!

    You have to know how to use it. Yes. So what?

    > Like I've said elsewhere in the comments, I've literally had to fix awful bugs, some security related, from how much vfork() is a preloaded foot gun with the safety off. Not everyone who has a bad impression of it is just following the "lore".

    Links or it didn't happen :)

    • > fork() -> fork bombs -> fork() is a footgun!

      > You have to know how to use it. Yes. So what?

      No, you have to own everything that you could call. For one example of many, are you in and out of a library that longjump's? That's really fun.

      Basically vfork's sharing of the full on mutable stack between the parent and child is full on bananers.

      > Links or it didn't happen :)

      You know that some people write proprietary code, even for unixen, right?

      6 replies →

Your code snippet assumes that your C compiler is just a high-level assembler. But it's not - it executes against a theoretical C virtual machine that doesn't know about about forking. It's allowed to generate some non-obvious code so long as it acts "as if" it has the same behaviour - but only from the point of view of that theoretic C VM.

For example, in theory _exit(1) could be implemented as longjmp(...) up to a point in some compiler-created top-level function that wraps up main(). Then that wrapper function could perform some steps to communicate the return code to the OS that trashes the stack before actually exiting. After all, if the process is about to exit anyway, what difference does it make if a bunch of memory is fiddled with? We know the answer to this but, from the point of view of the C virtual machine, it's irrelevant.

That particular scenario is unlikely but the point is that compiler implementations and optimisations are allowed to do very non-obvious things. You're only safe if you stick the rules of the C standard, which this 100% does not.

  • > Your code snippet assumes that your C compiler is just a high-level assembler. But it's not - it executes against a theoretical C virtual machine that doesn't know about about forking.

    Luckily a C compiler that doesn't know about concepts outside of the C Virtual machine will not be able to compile a Linux executable or even dynamically load a library that exposes the vfork call (let alone try to execute the underlying system call directly).

    • That doesn't make sense. The C VM only affects how C code is understood by the compiler, in particular what optimisations are allowed. It doesn't stop the compiler from generating an executable or linking to libraries.

      1 reply →

Stack manipulations are a real problem. Say if some parameter to exec after vfork uses stack slots created by compiler for temporary variables. & sure you compute those before the call to vfork, but then compiler applies code motion..

  • This is bad:

        int exec_failed = 0;
        
        {
          some_type some_var;
        
          pid = vfork();
          if (pid == -1) err(1, "vfork() failed");
        
          if (pid == 0)        
            execve(...);
        
          /* oops, execve() failed */
          exec_failed = 1;
        }
        
        if (exec_failed)
          cleanup_code; /* bad! */
        
        /* parent */
    

    But, it's hard to write code like that instead of:

        pid = vfork();
        if (pid == -1) err(1, "vfork() failed");
        
        if (pid == 0) {
          execve(...);
          
          /* oops, execve failed */
          some_cleanup;
          _exit(1);
        }
        
        /* parent */
    

    You have to really try.

    • Sure but if you have code like the following:

          pid = vfork();
          if (pid==0) {
             int something;
             exec();
             // cleanup code that uses something
             _exit(1);
          }
      

      Then the compiler (which knows `_exit` is noreturn) can conclude that if you enter the `if`, none of the existing stack slots will be read again, so it can reuse one of those stack slots for the `something` variable. But whoops, that means the original process has has its stack corrupted.

      This applies even when the variable declared at start of method, as compilers can perform equivalent variable lifetime analysis to let it reuse the stack slot. This is exactly why the POSIX spec makes it undefined to write to any variable after vfork (except the pid return variable, obviously).

      But even that is not strictly safe enough, since the compiler is allowed to introduce writes to the stack. This may for example, happen as part of calculating a temporary, if the compiler wants to use the register for something else, and decides against using some other register for storage, so spills to the stack.

      Obviously your `afork` completely avoids all those sorts of concerns by using a separate stack.

  • If "[s]tack manipulations are a real problem" (I say there are none if you're writing the code and know not to add any problematic stack manipulations) then avfork() should satisfy that concern.