← Back to context

Comment by pbsd

12 years ago

This sort of argument is becoming something of a fashion statement amongst some security people. It's not a strictly wrong argument: writing code in languages that make screwing up easy will invariably result in screwups.

But it's a disingenuous one. It ignores the realities of systems. The reality is that there is currently no widely available memory-safe language that is usable for something like OpenSSL. .NET and Java (and all the languages running on top of them) are not an option, as they are not everywhere and/or are not callable from other languages. Go could be a good candidate, but without proper dynamic linking it cannot serve as a library callable from other languages either. Rust has a lot of promise, but even now it keeps changing every other week, so it will be years before it can even be considered for something like this.

Additionally, although the parsing portions of OpenSSL need not deal with the hardware directly, the crypto portions do. So your memory-safe language needs some first-class escape hatch to unsafe code. A few of them do have this, others not so much.

It's fun to say C is inadequate, but the space it occupies does not have many competitors. That needs to change first.

First, I do realize that rewriting the software stack from the ground up to have only managed code is a huge task. I do think that as an industry, we should set a goal of having at least one server implementation along these lines (where 'set a goal' may mean, say, grants or calls for proposals). Microsoft Research implemented an experimental OS like that, although it probably didn't have all the features a modern OS would need. I don't know if we need a new language, but we do need a huge rethink of the server architecture, and not just a piece-by-piece rewrite, which I think will founder on the interface issues that you mentioned.

Anyway, I am quite realistic about the prospect of my comment having that kind of effect on the industry - I don't suffer from delusions of grandeur. I was aiming the comment more at people who choose C/C++ for no good reason to write a user-level app; that app is nearly certain to have memory use errors, and if it has any network or remote interface, chances are they can be easily exploited. I'd like as many people as possible to understand that they can't expect to avoid such errors, any more than one of the most heavily audited pieces of software avoided them. We have had decades of exploits of this vulnerability, and yet most programmers are oblivious to it, or think only bad programmers are at risk. So just as tptacek goes around telling people not to write their own crypto, I go around telling people - with less authority and effectiveness, unfortunately - not to write C/C++ code unless they really need to.

As for the performance issues forcing OpenSSL to use C, well, we apparently exposed all our secrets in the pursuit of shaving off those cycles. I hope we are happy.

  • That was a very reasonable response, I can roll with that.

    Just one thing: when I brought up talking directly to the hardware, I did not mean just for performance's sake. Avoiding side-channel attacks often requires to have high control over the generated machine code, and that is the primary reason to not do it in higher-level languages (unless they also permit that level of control).

    • In Java 8 the JVM knows how to compile AES straight through to AES-NI invocations, so the CPU itself is doing the hardware accelerated crypto (in constant time). It's not necessarily the case that higher level languages have to be unsafer: especially not on something like a server where the overhead of JIT compilers get amortized out.

      1 reply →

  • Do not like the term "C/C++", and especially in this context. Modern C++ makes avoiding this sort of bug as easy as doing so in the "managed" languages already discussed.

    This is as much a cultural as a technical problem; C really is in the last chance saloon for this sort of problem, we have the solution to hand, but a strong cadre of developers will still only consider C for this sort of work.

    • Thank you, this really had to be said. You can do C in C++ (and if you do that, chances are that you're doing it wrong), but you can't do C++ in C. The ugliest and unsafe parts of C++ are invariably those coming almost untouched from C (raw pointers, casts from/to void ptrs, etc). C is close to the machine, but C++ is largely a vertical language where you can do things low-level or high-level (and yes, you can do a lot of interesting things with templates, no matter the bad rap they've got); and most of the current C++ community vastly prefers high-level-like code, for good reasons (for starters, it may be even more performant). A LOT of the sources of unsafe code go away by two simple techniques: use RAII-managed smart pointers instead of raw ones (some of them are as lightweight as a raw pointer), and prefer vector (or other container) to array.

      I love C and C++, and each one has its place, but really, they're very different. Almost as much as C++ is to Java, for example.

    • The problem, and I think this may have been touch on somewhere else in the thread, is that C++ can be really complex to wrap. So embedding a C++ library in another, higher-level language can be very tricky. It often requires wrapping the parts of the API you want to use in C.

      I'm fine with low-level libraries being written in C++, but would hope that developers expose a C API around everything.

    • I don't think this is true. When it comes to copying bytes from a buffer supplied from the network, there isn't a wrapper/manager class that can do this for you. Somewhere down the pipeline some piece of code has to copy the unstructured, variable length byte stream into a manageable data structure. C++ does not give any way to do this beyond the mechanisms available in C.

      1 reply →

    • Thank you for pointing out that this is not an issue in C++, only in C. There are times when we need C and one can write C code and compile it with a C++ compiler. It's an option to help ease people to modern, idiomatic C++, but please don't lump the two together. C and C++ are two entirely different languages and C++ is much safer.

How about Ada? It is time tested! GNU's Ada shares the same backend as GCC so it can be pretty fast. Good enough for DoD. =P

Edit: I say this having used VHDL quite a bit. I appreciate its type strictness and ranges.

  • My first thought too was Ada, it's easily callable from anything that can call C afaik, and has infinitely better support for catching these sorts of bugs than C or C++ do. It's basically made for this kind of project, and yet no one in the civilian population really cares; it's a real shame. Not only does Ada make it easy to do things safely, it makes it very hard to do things unsafely.

    I've been advocating Ada's use on HN for a few years now, but it always falls on deaf ears. People seem to think it's old and dead like COBOL or old FORTRAN, but it's really a quite modern language that's extremely well thought out. Its other drawback is that it's pretty ugly and uses strange names for things (access is the name given to pointer like things, but Ada specifies if you have say a record with a 1 bit Bool in it, you must be able to create an access to it, so a pointer is not sufficient).

    • Tony Hoare (Mr. Quicksort, CSP, etc...) has softened his stance since "The Emperor's Old Clothes", but his concern was that ADA is too complicated to be understandable and safe. I hated Pascal because the array length was part of its type... but maybe that kind of thinking is apparently what it takes to avoid bugs like Heartbleed.

      1 reply →

  • An interesting new project written in Ada is a DNS server called Ironsides, written specifically to address all the vulnerabilities found in Bind and other DNS servers [1].

    "IRONSIDES is an authoritative DNS server that is provably invulnerable to many of the problems that plague other servers. It achieves this property through the use of formal methods in its design, in particular the language Ada and the SPARK formal methods tool set. Code validated in this way is provably exception-free, contains no data flow errors, and terminates only in the ways that its programmers explicitly say that it can. These are very desirable properties from a computer security perspective."

    Personally, I like Ada a lot, except for two issues:

    1. IO can be very, very verbose and painful to write (in that way, it's a bit like Haskell, although not for the same reason). Otherwise, it's a thoroughly modern language that can be easily parallelized.

    2. Compilers for modern Ada 2012 are only available for an extravagant amount of money (around $20,000 last I heard) or under the GPL. Older versions of the compiler are available under GPL with a linking exception, but they lack the modern features of Ada 2012 and are not available for most embedded targets. And the Ada community doesn't seem to think this is much of a problem (the attitude is often, "well, if you're a commercial project, just pony up the money"). The same goes for the most capable Ada libraries (web framework, IDE, etc.) -- they're all commercial and pretty costly. Not an ideal situation for a small company.

    But yes, Ada is exceptionally fast and pretty safe. There's a lot of potential there, but to be honest my hopes are pinned on Rust at this point.

    1. http://ironsides.martincarlisle.com/

We might be stuck with C for quite a while but then maybe the more interesting question is 'how does this sort of thing get past review?'. It's not hard to imagine how semantic bugs (say, the debian random or even the apple goto bug) can be missed. This one, on the other hand, hits things like 'are the parameters on memcpy sane' or 'is untrusted input sanitized' which you'd think would be on the checklist of a potential reviewer.

"Rust has a lot of promise, but even now it keeps changing every other week..."

A larger problem, in my opinion, is that things like OpenSSL are used (And should be!) from N other languages. As a result, calling into the library requires almost by definition lowest-common denominator interfaces. Which is C.

C code calling into Rust can certainly be done, but I believe it currently prohibits using much of the standard library, which also removes a lot of the benefits.

C++ doesn't, I think, have as much of a problem there, but I'm somewhat skeptical of C++ as a silver bullet in this case.

I don't know about Ada and any other options.

[ATS, anyone?]

Why not write the code in C# (for example) and extract it to $SYSTEM_PROGRAMMING_LANGUAGE? It wouldn't be much different than what Xamarin are doing now for creating iOS and Android apps with C#.

  • Using C as an output language, backed by guarantees at the higher level, could certainly work. I believe ATS [1] works this way, and can even avoid garbage collection altogether if desired. I understand it is not an easy language, though.

    Nimrod [2] also generates C, but as I understand it garbage collection is unavoidable.

    [1] http://www.ats-lang.org/

    [2] http://nimrod-lang.org/

    • This is one reason I'd like to see the removed LLVM C backend brought back and modernized, with Rust as the source language. Rust is safe, has no mandatory garbage collector, and has a much lower impedance mismatch with C or C++ than most higher level languages, so it should work well for libraries that are expected to integrate with C code.

      4 replies →

  • Because if you write that in C#, you have to bundle 54MB common runtime and GC withit.

    • Did you actually read my entire comment, or did you just see "C#" and post a knee-jerk reaction? If you extract code from C# to C, you don't need a GC or any of the .NET class libraries -- you'd just have a standalone C file to use like any other.

      2 replies →

>Additionally, although the parsing portions of OpenSSL need not deal with the hardware directly, the crypto portions do. So your memory-safe language needs some first-class escape hatch to unsafe code. A few of them do have this, others not so much.

For the other points there is some debate, but don't most serious languages have a C FFI?

  • OpenSSL and similar libraries spend most of their time processing short packets. For example, encrypting a few hundred bytes using AES these days should take only a few hundred CPU cycles. This means that the overhead of calling the crypto code should be minimal, preferably 0. This is in part what I meant by "first-class". Perhaps I should have written "zero-overhead" instead.

    I googled around just now for some benchmarks on the overhead of FFIs. I found this project [1] which measures the FFI overhead of a few popular languages. Java and Go do not look competitive there; Lua came surprisingly on top, probably by inlining the call.

    Before you retort with an argument that a few cycles do not matter that much, remember that OpenSSL does not run only in laptops and servers; it runs everywhere. What might be a small speed bump on x86 can be a significant performance problem elsewhere, so this is something that cannot be simply ignored.

    [1] https://github.com/dyu/ffi-overhead

    • those linked tests are extremely disingenuous, it only shows the fixed cost of FFIs.

      Considering that in C the plusone call is 4 or so cycles, and the Java example is 5 times slower, that's only 20 or so cycles. If the function we're FFIing into is 400 cycles, that's only a 1% decrease in speed. I'm willing to pay that price if it means not having to wake up to everything being vulnerable every couple of months.

    • This project attempts to measure the overhead of calling out from $LANGUAGE and into C, which is the reverse of what's necessary to solve the problem stated here — to write a low-level library in a high-level language.

      There are other means of achieving a secure implementation, such as programming in a very high-level language, such as Cryptol, and compiling to a low-level language:

      http://corp.galois.com/cryptol/

      2 replies →

    • Lua being on top shouldn't be surprising. Its entire purpose is to call into (and to be called from) C, and this case has been highly optimized.

  • I think that was part of his point, but yeah I don't see why you couldn't do the pure parts in a safer language. FFIs to C are fairly easy to do I think, probably partly because of how simple the calling convention is.

I believe Haskell could be up to the job, but I heard that there were some difficulties in guarding against timing attacks. However those could have just been noise. I know that a functional (I believe and haha) operating system was made in Haskell.

Aren't Operating Systems lower level than OpenSSL?

  • I look forward to reading the hilarious threads that will be spawned when you take to linux-kernel, freebsd-hackers, openbsd-misc, etc. and inform them they should be developing their kernels in Haskell.

    Functional programming's unpopularity is not rooted in any real or imagined inability to write operating systems.

    • Well you also have to account for the fact that they've spent a lot of time working with their tools and are quite invested in them.

      Perhaps they would balk at the idea of writing a kernel in Haskell, but it has been done before.

Other than C there is also C++ and D if you don't want to stray to far from C. The problem with C++ is that even though it is possible to adapt to a memory safe programming style with C++ the concepts are not prevalent in the community.

  • Secure coding is very prevalent in C++ (outside the group of old C programmers who write C++ code as if it were C). C++ is far safer than C.

    • Sorry, let me be more clear: In the courses and books I've seen and read novices don't get taught secure coding techniques as C++ is often introduced as a superset of C and security in general is not of interest to the teacher. Then later on when they transition to the web as a primary source of information there is a lot of legacy C++ code lying around that does not use modern memory management concepts. Also, as nobody has ever told them the importance of strict coding styles for security they also don't start looking for them, even though it would be possible to find them with the right keywords.

      2 replies →

  • C++ does not have a reasonable memory safe subset. No, smart pointers do not provide real memory safety.

    • They are generally better than nothing, however. And they do generally offer better performance, better predictability, an a simpler implementation than other approaches to memory management and garbage collection.

      Maybe a language like Rust will offer a safer alternative at some point, but that point surely isn't today, and probably not tomorrow, either. Maybe there are other languages that offer better safety, but they often bring along their own set of very serious drawbacks.

      In terms of writing relatively safe code today, that performs relatively well, that can integrate easily with other libraries/frameworks/code, and can be readily maintained, the use of C++ with modern C++ techniques is often the only truly viable option.

      2 replies →

  • What would you advocate as memory safe programming styles? Strong guarantee? RAII?

    • RAII is a minimum. You also have to treat any direct and indirect (unchecked) pointer arithmetic as a potential security vulnerability though. For example if you use the [] operator of a vector you can still access memory outside of the allocated space. Instead you would have to use the at() method which actually checks the bounds. Even iterator are problematic as the iterator on a vector also ignores the actual bounds iirc (though with most idioms the comparison against the end iterator is pretty error proof). This lends itself to constructs where you do not work with any indexes at all in the way foreach loops abstract your position in a container away.

      1 reply →

What about ADA? GNAT looked pretty good a few years back when I was trying to get into that sort of thing.

>This sort of argument is becoming something of a fashion statement amongst some security people.

Just the ones who don't understand how good API designs can work well to solve these problems, don't worry not all of us are like that :)

What you say can easily be disproved, and you are simply asking for too much if you ask for something to be a drop-in replacement for OpenSSL. Some re-architecting is requred simply because of the insecurity of C.

For example, a shared library that implements SSL would have to be a shim for something living in a separate process space.

http://hackage.haskell.org/package/tls

That is a Haskell implementation of TLS. It is written in a language that has very strong guarantees about mutation, and a very powerful type system which can express complex invariants.

Yes, crypto primitives must be written in a low level language. C is not low level enough to write crypto, neither securely nor fast, so that's not an argument in its favor.