← Back to context

Comment by amalcon

12 years ago

Nothing in the standard prevents a C compiler + tightly coupled malloc implementation from implementing bounds checks. Out-of-bounds operations result in undefined behavior, and crashing the program is a valid response to undefined behavior. If your malloc implementation cooperates, you can even bounds-check pointer arithmetic without violating calling conventions.

It's quite a shame that there isn't a compiler that does this, and it's a project I've considered spending some time on if I can find a big enough block of that to get a solid start.

Unrestricted pointer arithmetic is indeed incompatible with memory safety. You set a pointer to point to one structure, then you change it and it now points to another structure or array. The compiler doesn't know the semantics of your code, so how can it tell if you meant to do that? And malloc/memcpy is way too low to check this stuff. It only sees memory addresses; it has no idea what variables are in them. Tightly coupled would mean passing information like "variable secret_key occupies address such-and-such" into the libc, which does violate POSIX standards, and will result in lots of code breaking. I don't see why we wouldn't just write in C# or Java or Rust, instead of a memory-safe subset of C (and it would have to be a subset).

Edit: here's one project for making a memory-safe C: http://www.seclab.cs.sunysb.edu/mscc/ . Interesting, but (a) it is a subset of C, (b) it doesn't remove all vulnerabilities, and (c) I still don't grok the advantage of using this over a language actually designed for modern, secure application programming.

  • I'll assume the case you're concerned about is the one legitimately tricky case (where you have an array of structs that include arrays, and you perform arithmetic on a pointer into the inner array), because the other readings I'm coming up with necessarily invoke undefined behavior, either by running off the end of an allocation (what we're checking) or breaking strict aliasing (in which case false positives are OK). Depending on what you do with this pointer (e.g. passing it into a custom memcpy), the compiler may not be able to enforce runtime checks by itself.

    This is where we do need some extra help, in the form of a library that holds state for the compiler so that we don't need to instrument our pointers. Nothing in the C standard prevents the compiler from doing this. The library you pass the pointer into may ignore this information, if it doesn't have the necessary instrumentation, but we at least get the capability.

    Re: other languages, Rust I will grant. It's the only one of those that's compelling for C's use-cases (Java and C# are both entirely unusable for the best uses of C and C++).

  • >You set a pointer to point to one structure, then you change it and it now points to another structure or array. The compiler doesn't know the semantics of your code, so how can it tell if you meant to do that?

    If you changed it via arithmetic or anything other than direct assignment you have violated the standard. Assuming of course that they are part of separate allocations, pointers from one may not interact with pointers from another except through equality testing and assignment.

  • You can do it, although at a considerable performance hit. The usual approach is "fat" pointers that include bounds information. Memory safe pointer arithmetic is achieved by checking that any constructed pointer lies within those bounds, and dying noisily if it does not (alternatively, you can test on dereference).

C language environments that worked like this have been commercially available in the past: Saber-C in the '90s, and perhaps earlier, was one example.

One problem is that the obvious implementation technique is to change the representation of pointers (to include base and bounds information, or a pointer to that), which means that you need to redo a lot of the library as well. (Or convert representations when entering into a stock library routine, and accept that whatever it does with the pointer won't get bounds-checked.)

But it's certainly doable.

I implemented this once in my C interpreter picoc. Users hated it because it also prevented them from doing some crazy C memory access tricks, so I ended up taking it out.

If you have a char* buf; block you got from network stack and you have to copy buf[3] bytes from the position buf+15 then the compiler doesn't know what to check for if you don't cross the boundary of that buffer.

I think clang's AddressSanitizer gets pretty close to what you want. It misses some tricky cases on use-after-return, but other than that it offers pretty robust memory safety model for bounds checks, double free, and so on.