OpenBSD – pinning all system calls

2 years ago (marc.info)

Classic thread on this stuff from Halvar Flake:

https://twitter.com/halvarflake/status/1156815950873804800

With that in mind, it'd be handy to know which exploit techniques these steps break, and whether those steps are in the current "meta" game for exploit developers.

(The specific mitigation here: the kernel formerly locked system call invocation down to the libc.so area of program text in memory; libc.so is big, so now OpenBSD locks specific system calls down to their specified libc stubs; further, in static binaries, the same mechanism locks programs down to only those system calls used in the binary, which effectively disables all the system calls not explicitly invoked by the program text of a static binary).

  • Indeed, in CCC's "systematic evaluation of OpenBSD's mitigations"[0] the presenter explicitly calls out OpenBSD's tendency to present mitigations without specific examples of CVEs it defeats or exploit techniques the mitigations are known to defend against:

    > Proper mitigations I think stem from proper design and threat modeling. Strong, reality-based statements like "this kills these vulnerabilities," or "this kills this CVE; it delays production of an exploit by one week." And also thorough testing by seasoned exploit writers. Anything else is relying on pure luck, superstition, and wishful thinking.

    Some of OpenBSD's mitigations are excellent and robust in defensiveness; others are amorphous and not particularly useful.

    [0]: https://youtu.be/3E9ga-CylWQ?feature=shared&t=2770

    • > Proper mitigations I think stem from proper design and threat modeling. Strong, reality-based statements like "this kills these vulnerabilities," or "this kills this CVE; it delays production of an exploit by one week." And also thorough testing by seasoned exploit writers. Anything else is relying on pure luck, superstition, and wishful thinking.

      The comment seems to imply that "proper design and threat modeling" must stem from real-world CVE-s and proofs of concept. That seems to me like "if nobody heard it, the tree didn't fall" kind of thinking.

      I'm sure OpenBSD developers have very good intuition on what could be used in a vulnerability, without having to write one themselves. And fortunately, they don't have a manager above them to whom they need to justify their billing hours.

      11 replies →

    • OpenBSD disabled hyperthreading before speculative execution attacks were in the wild. In the words of Greg K-H “OpenBSD was right”.

      There probably is some amount of security theatre in OpenBSD but they have also mitigated attacks which weren’t even known to exist.

      6 replies →

    • There have been cases where OpenBSD's hypothetical mitigations have worked out well for the project. I recall a relatively recent DNS cache poisoning attack that OpenBSD was novel in pre-emptively mitigating because something (I think it was the port?) was "needlessly" random.

      If a mitigation has negligible performance impact, and doesn't introduce a new attack vector, I can't imagine why it would be seen as a bad thing.

      4 replies →

  • > Classic thread on this stuff from Halvar Flake:

    That's from four years ago and does not address these technical issues. Are you going to pull it out every time OpenBSD is mentioned? I think people understand that you don't like their approach, etc., and the flaws you see, and that OpenBSD isn't designed for your interests.

  • Is there a current meta for OpenBSD exploit developers?

    What's the right way to go about hardening the system if there's no meta to observe?

    My very naive take would be something like: A successful exploit depends on jumping through a number of different hoops. Each of those hoops has an estimated success probability associated with it. We can multiply all the individual probabilities together to get an estimated probability of successful exploit -- assuming that hoop probabilities are independent, which seems reasonable? The most efficient way to harden against exploits is to try and shrink whichever hoop possesses the greatest partial derivative of overall exploit success probability with respect to developer time.

    • The meta doesn’t exist because nobody targets OpenBSD because it’s not used. People’s analysis of it is mostly just their educated guess as to how work for other platforms would carry over.

    • > The most efficient way to harden against exploits is to try and shrink whichever hoop possesses the greatest partial derivative of overall exploit success probability with respect to developer time.

      Depending on your definition of efficient, adding more hoops should work exponentially better.

      5 replies →

Without a pre-formed opinion: does anybody have an intuition for the security benefits this provides? My first thought is that it’s primarily mitigating cases of attacker-introduced shellcode, which should already be pretty well covered by techniques like W^X. Code reuse techniques (ROP, JOP, etc.) aren’t impacted, right?

I would also think this would cause problems for JITed code, although maybe syscalls in JITed code aren’t common enough for this to be an issue (or the JIT gets around it by calling a syscall thunk, similar to how Go handled OpenBSD’s earlier syscall changes).

  • > Code reuse techniques (ROP, JOP, etc.) aren’t impacted, right?

    Unless I'm mistaken, this should restrict what you can do with ROP gadgets that contain syscalls. You will only be able to use the gadget with its intended arguments, since other syscall types will be disallowed.

    > I would also think this would cause problems for JITed code

    They can probably just jump into precompiled code that performs the needed syscall. Also, making syscalls directly from something like JITed JavaScript is generally avoided anyways. AFAIK browsers don't even let the processes that run JavaScript touch much of the system at all, instead they have to use an IPC mechanism to ask a slightly more privileged process to perform specific tasks.

    • > You will only be able to use the gadget with its intended arguments, since other syscall types will be disallowed.

      That makes sense, although "intended" arguments here means still being able to invoke `execve(2)`, etc., right? The gadget will still be able to mangle whatever it likes into the arguments for that syscall; it just won't be able to mangle a `wait(2)` into an `execve(2)`, I think.

      Your points about JITs make sense, thanks.

      1 reply →

  • The other comment on this thread mentions that it also does something else:

    >disables all the system calls not explicitly invoked by the program text of a static binary

    This means that if the original library didn't have an execve call in it, you would'nt be able to use it even if with ROP. In short, this seems useful to block attackers from using syscalls that were not originally used by the program and nothing else. It can be useful.

    • Sure, assuming your programs don't execute other programs. I don't know much about OpenBSD specifically, but spawning all over the place is the "norm" in terms of "Unix philosophy" program design.

      (I agree with the point in the adjacent thread: it's hard to know what to make of security mitigations that aren't accompanied by a threat model and attacker profile!)

      9 replies →

  • > Code reuse techniques (ROP, JOP, etc.) aren’t impacted, right?

    One thing to note is that system calls can no longer be made from the program's .text section; only from within libc. This is highly important because of ASLR: in order to ROP into a syscall, an attacker must now know where libc is located in the virtual address space. Before this mitigation, an attacker that only knew the address of the program binary could search for a sequence of bytes within the .text section that happened to decode to a syscall instruction, and use that for ROP (code reuse techniques can often access a lot of unexpected instructions by jumping into the middle of a multibyte instruction, due to x86's complex and variable-length instruction encoding).

> in ld.so text, and in that case the main program's text cannot do system calls

I don’t understand this case. Is there a way to do IO in openbsd without a system call? Without IO how can you get the result of the computation?

Is this a singular special case?

  • ld.so can do so while initially linking the application in pre main, then in

    > 4) in libc.so text, and ld.so tells the kernel where that is

    > The first 3 were cases were configured entirely by the kernel, the 4th case used a new msyscall(2) system call to identify the correct libc.so region.

    ld.so passes its ability to make syscalls to libc.so. The application has to call into libc.so in order to perform any IO.

    • Ah, makes sense, thanks. Libc can sanitize all the inputs, and as long as ld.so has a hardwired path to libc all is well. This way you don’t even need a facility to tell the kernel “this binary is allowed to make system calls.

This implementation has a trivial buffer overflow, ROFLMAO

  • Have you managed to trigger this? You never ended up explaining how the heap overflow occurs, and I cannot determine whether the other person who was guessing how it might happen is right, because I am not very familiar with OpenBSD's code.

  • Would you mind sharing how and where in the code, specifically.

    Geniunely curious.

    • I don't know how they define `MAX`, but I'm guessing it's a typical "a>b?a:b". In function `elf_read_pintable` the `npins` is defined as signed int and `sysno` as unsigned int.

      So this comparison will be unsigned and will allow to set `npins` to any value, even negative:

        npins = MAX(npins, syscalls[i].sysno)
      

      Then `SYS_kbind` seems to be a signed int. So this comparison will be signed and "fix" the negative `npins` to `SYS_kbind`:

        npins = MAX(npins, SYS_kbind)
      

      And finally the `sysno` index might be out of bounds here:

        pins[syscalls[i].sysno] = syscalls[i].offset
      

      But maybe I'm completely wrong, I'm not interested in researching it too much.

      7 replies →

    • Out-of-bounds heap write happens in this function:

              int
              elf_read_pintable(struct proc *p, Elf_Phdr *pp, struct vnode *vp,
                  Elf_Ehdr *eh, uint **pinp)
              {
               struct pinsyscalls {
                u_int offset;
                u_int sysno;
               } *syscalls = NULL;
               int i, npins = 0, nsyscalls;
               uint *pins = NULL;
              
          [1]  nsyscalls = pp->p_filesz / sizeof(*syscalls);
               if (pp->p_filesz != nsyscalls * sizeof(*syscalls))
                goto bad;
          [2]  syscalls = malloc(pp->p_filesz, M_PINSYSCALL, M_WAITOK);
          [3]  if (elf_read_from(p, vp, pp->p_offset, syscalls,
                   pp->p_filesz) != 0) {
                goto bad;
               }
              
          [4]  for (i = 0; i < nsyscalls; i++)
          [5]   npins = MAX(npins, syscalls[i].sysno);
          [6]  npins = MAX(npins, SYS_kbind);  /* XXX see ld.so/loader.c */
          [7]  npins++;
              
          [8]  pins = mallocarray(npins, sizeof(int), M_PINSYSCALL, M_WAITOK|M_ZERO);
               for (i = 0; i < nsyscalls; i++) {
          [9]   if (pins[syscalls[i].sysno])
          [10]   pins[syscalls[i].sysno] = -1; /* duplicated */
                else
          [11]   pins[syscalls[i].sysno] = syscalls[i].offset;
               }
               pins[SYS_kbind] = -1;   /* XXX see ld.so/loader.c */
              
               *pinp = pins;
               pins = NULL;
              bad:
               free(syscalls, M_PINSYSCALL, nsyscalls * sizeof(*syscalls));
               free(pins, M_PINSYSCALL, npins * sizeof(uint));
               return npins;
              }
      

      So first of all we calculate the number of syscalls in the pin section [1], allocate some memory for it [2] and read it in [3].

      At [4], we want to figure out how big to make our pin array, so we loop over all of the syscall entries and record the largest we've seen so far [5]. (Note: the use of `MAX` here is fine since `sysno` is unsigned -- see near the top of the function).

      With the maximum `sysno` found, we then crucially go on to clamp the value to `SYS_kbind` [6] and +1 at [7].

      This clamped maximum value is used for the array allocation at [8].

      We now loop through the syscall list again, but now take the unclamped `sysno` as the index into the array to read at [9] and write at [10] and [11]. This is essentially the vulnerability right here.

      Through heap grooming, there's a good chance you could arrange for a useful structure to be placed within range of the write at [11] -- and `offset` is essentially an arbitrary value you can write. So it looks like it would be relatively easy to exploit.

      8 replies →