← Back to context

Comment by yaakov34

12 years ago

There was a discussion here a few years ago (https://news.ycombinator.com/item?id=2686580) about memory vulnerabilities in C. Some people tried to argue back then that various protections offered by modern OSs and runtimes, such as address space randomization, and the availability of tools like Valgrind for finding memory access bugs, mitigates this. I really recommend re-reading that discussion.

My opinion, then and now, is that C and other languages without memory checks are unsuitable for writing secure code. Plainly unsuitable. They need to be restricted to writing a small core system, preferably small enough that it can be checked using formal (proof-based) methods, and all the rest, including all application logic, should be written using managed code (such as C#, Java, or whatever - I have no preference).

This vulnerability is the result of yet another missing bound check. It wasn't discovered by Valgrind or some such tool, since it is not normally triggered - it needs to be triggered maliciously or by a testing protocol which is smart enough to look for it (a very difficult thing to do, as I explained on the original thread).

The fact is that no programmer is good enough to write code which is free from such vulnerabilities. Programmers are, after all, trained and skilled in following the logic of their program. But in languages without bounds checks, that logic can fall away as the computer starts reading or executing raw memory, which is no longer connected to specific variables or lines of code in your program. All non-bounds-checked languages expose multiple levels of the computer to the program, and you are kidding yourself if you think you can handle this better than the OpenSSL team.

We can't end all bugs in software, but we can plug this seemingly endless source of bugs which has been affecting the Internet since the Morris worm. It has now cost us a two-year window in which 70% of our internet traffic was potentially exposed. It will cost us more before we manage to end it.

From a quick reading of the TLS heartbeat RFC and the patched code, here's my understanding of the cause of the bug.

TLS heartbeat consists of a request packet including a payload; the other side reads and sends a response containing the same payload (plus some other padding).

In the code that handles TLS heartbeat requests, the payload size is read from the packet controlled by the attacker:

  n2s(p, payload);
  pl = p;

Here, p is a pointer to the request packet, and payload is the expected length of the payload (read as a 16-bit short integer: this is the origin of the 64K limit per request).

pl is the pointer to the actual payload in the request packet.

Then the response packet is constructed:

  /* Enter response type, length and copy payload */
  *bp++ = TLS1_HB_RESPONSE;
  s2n(payload, bp);
  memcpy(bp, pl, payload);

The payload length is stored into the destination packet, and then the payload is copied from the source packet pl to the destination packet bp.

The bug is that the payload length is never actually checked against the size of the request packet. Therefore, the memcpy() can read arbitrary data beyond the storage location of the request by sending an arbitrary payload length (up to 64K) and an undersized payload.

I find it hard to believe that the OpenSSL code does not have any better abstraction for handling streams of bytes; if the packets were represented as a (pointer, length) pair with simple wrapper functions to copy from one stream to another, this bug could have been avoided. C makes this sort of bug easy to write, but careful API design would make it much harder to do by accident.

  • It is indeed astonishing how simple-minded this bug is. But these bugs come in all levels of complexity, from simple overstuffed buffers to logical ping-pong that hurts your brain when you try to follow it. We need to get rid of them once and for all. If the whole world can't use a certain tool effectively, then the whole world isn't broken; the tool is bad.

    • Machine level languages like C and C++ aren't necessarily bad tools, even in their current states. However, I agree that they might be bad tools for the purpose of writing security libraries.

      6 replies →

  • I've felt that C makes this code easy to write because it makes doing the right thing hard. What you are describing is just a lot of work in C, compared to a language with something akin to Java's generics, which are in turn an afterthought in the ML family of languages. What we're asking for is not that complicated from a PL standpoint. A generic streams library?

    Economics plays an invisible part here. Someone writing a library has a limited amount of time to implement some set of features, and to balance that against other needs, like making the code "clean"/pretty and secure. In this case, pretty code and secure code are akin. Consumers would likewise have to balance out feature needs with how likely the code is going to explode. What it comes down to is that you aren't likely to have secure, stable code in a language that doesn't inherently encourage it.

    It starts to be clearer then, that the more modern, "prettier" languages offer material benefits in their efforts to be more elegant.

    • That's what I like about Ruby. ;)

      Even in C, Go or Python, I column align any text that is remotely similar, so differences are obvious.

      Clean code might be extra work but the net work (maintenance) should amortize less. Reducing cognitive load for large supportable production codebase cannot be underscored enough.

      22 replies →

  • Thanks for this. How is this reading arbitrary memory locations though? Isn't this always reading what is near the pl? As in, can you really scan the entire process's memory range this way or just a small subset where malloc (or the stack, whichever this is) places pl?

    • The latter, and AFAIK the buffer doesn't get reallocated on every connection, so it should be unlikely that any private keys actually get dumped. However, I could be missing a way to exploit it.

      9 replies →

  • This reminds me of what another programmer told me a long time ago when we were discussing C; "The problem with C is that people make terrible memory managers.". So true.

  • I agree that this seems like an abstraction for this is missing, but I always have the feeling that what you're doing in covering holes in a leaking dam you might get good at it, but you'll always have leaks.

  • I have always detested C (also C++) because it's so unreadable... the snippets of code you cite are just so dense ie. a function like n2s() gives pretty much no indication of what it does to a casual reader. Just reading the RFC (it is pretty much written in a C style) gives me the creeps.

    The RFC doesn't mention why there has to be a payload, why the payload has to be random size, why they are doing an echo of this payload, why there has to be a padding after the payload. If this data is just a regular C struct like the RFC makes it out to be (I didn't know you could have a struct with a variable size, but apparently the fields are really pointers or it's just a mental model and not a real struct).

    Apparently the purpose of the payload is path MTU discovery. Something that is supposed to happen at the IP layer, but I don't know enough about datagram packets. I guess an application may want to know about the MTU as well...

    I'm not here to point fingers, I'm just saying C is a nightmare to me and a reason for me to never be involved with system programming or something like drafting RFC's ;-).

    But if one can argue that C is a bad choice for writing this stuff, then that is not an isolated thing. "C" is also the language of the RFCs. "C" is also the mindset of the people doing that writing. After all, the language you speak determines how you think. It introduces concepts that become part of your mental models. I could give many examples, but that's not really the point.

    And it's about style and what you give attention to. To me, that RFC is a real bad document. It starts to explain requirements to exceptional scenario's (like when the payload is too big) before even having introduced and explained the main concepts and the how and why's.

    So while you may argue that this is a C problem and not a protocol problem, it is really all related.

    And you may also say, in response to someone blaming these coders, that blame is inappropriate (and it is) because these are volunteers and they are donating their free time to something to find valuable, the whole distribution and burden of responsibility is, naturally, also part of the culture and how people self-organize and so on.

    As someone else explained (https://news.ycombinator.com/item?id=7558394) the protocol is real bad but it is the result of more or less political limitations around submitting RFCs for approval. There is no reason for the payload in TLS (but apparently there is in DTLS) but my point is simply this:

    If you are doing inelegant design this will spill over into inelegant implementation. And you're bound to end up with flaws.

    Rather than trying to isolate the fault here or there, I would say this is a much larger cultural thing to become aware of.

This sort of argument is becoming something of a fashion statement amongst some security people. It's not a strictly wrong argument: writing code in languages that make screwing up easy will invariably result in screwups.

But it's a disingenuous one. It ignores the realities of systems. The reality is that there is currently no widely available memory-safe language that is usable for something like OpenSSL. .NET and Java (and all the languages running on top of them) are not an option, as they are not everywhere and/or are not callable from other languages. Go could be a good candidate, but without proper dynamic linking it cannot serve as a library callable from other languages either. Rust has a lot of promise, but even now it keeps changing every other week, so it will be years before it can even be considered for something like this.

Additionally, although the parsing portions of OpenSSL need not deal with the hardware directly, the crypto portions do. So your memory-safe language needs some first-class escape hatch to unsafe code. A few of them do have this, others not so much.

It's fun to say C is inadequate, but the space it occupies does not have many competitors. That needs to change first.

  • First, I do realize that rewriting the software stack from the ground up to have only managed code is a huge task. I do think that as an industry, we should set a goal of having at least one server implementation along these lines (where 'set a goal' may mean, say, grants or calls for proposals). Microsoft Research implemented an experimental OS like that, although it probably didn't have all the features a modern OS would need. I don't know if we need a new language, but we do need a huge rethink of the server architecture, and not just a piece-by-piece rewrite, which I think will founder on the interface issues that you mentioned.

    Anyway, I am quite realistic about the prospect of my comment having that kind of effect on the industry - I don't suffer from delusions of grandeur. I was aiming the comment more at people who choose C/C++ for no good reason to write a user-level app; that app is nearly certain to have memory use errors, and if it has any network or remote interface, chances are they can be easily exploited. I'd like as many people as possible to understand that they can't expect to avoid such errors, any more than one of the most heavily audited pieces of software avoided them. We have had decades of exploits of this vulnerability, and yet most programmers are oblivious to it, or think only bad programmers are at risk. So just as tptacek goes around telling people not to write their own crypto, I go around telling people - with less authority and effectiveness, unfortunately - not to write C/C++ code unless they really need to.

    As for the performance issues forcing OpenSSL to use C, well, we apparently exposed all our secrets in the pursuit of shaving off those cycles. I hope we are happy.

    • That was a very reasonable response, I can roll with that.

      Just one thing: when I brought up talking directly to the hardware, I did not mean just for performance's sake. Avoiding side-channel attacks often requires to have high control over the generated machine code, and that is the primary reason to not do it in higher-level languages (unless they also permit that level of control).

      2 replies →

    • Do not like the term "C/C++", and especially in this context. Modern C++ makes avoiding this sort of bug as easy as doing so in the "managed" languages already discussed.

      This is as much a cultural as a technical problem; C really is in the last chance saloon for this sort of problem, we have the solution to hand, but a strong cadre of developers will still only consider C for this sort of work.

      5 replies →

  • How about Ada? It is time tested! GNU's Ada shares the same backend as GCC so it can be pretty fast. Good enough for DoD. =P

    Edit: I say this having used VHDL quite a bit. I appreciate its type strictness and ranges.

    • My first thought too was Ada, it's easily callable from anything that can call C afaik, and has infinitely better support for catching these sorts of bugs than C or C++ do. It's basically made for this kind of project, and yet no one in the civilian population really cares; it's a real shame. Not only does Ada make it easy to do things safely, it makes it very hard to do things unsafely.

      I've been advocating Ada's use on HN for a few years now, but it always falls on deaf ears. People seem to think it's old and dead like COBOL or old FORTRAN, but it's really a quite modern language that's extremely well thought out. Its other drawback is that it's pretty ugly and uses strange names for things (access is the name given to pointer like things, but Ada specifies if you have say a record with a 1 bit Bool in it, you must be able to create an access to it, so a pointer is not sufficient).

      2 replies →

    • An interesting new project written in Ada is a DNS server called Ironsides, written specifically to address all the vulnerabilities found in Bind and other DNS servers [1].

      "IRONSIDES is an authoritative DNS server that is provably invulnerable to many of the problems that plague other servers. It achieves this property through the use of formal methods in its design, in particular the language Ada and the SPARK formal methods tool set. Code validated in this way is provably exception-free, contains no data flow errors, and terminates only in the ways that its programmers explicitly say that it can. These are very desirable properties from a computer security perspective."

      Personally, I like Ada a lot, except for two issues:

      1. IO can be very, very verbose and painful to write (in that way, it's a bit like Haskell, although not for the same reason). Otherwise, it's a thoroughly modern language that can be easily parallelized.

      2. Compilers for modern Ada 2012 are only available for an extravagant amount of money (around $20,000 last I heard) or under the GPL. Older versions of the compiler are available under GPL with a linking exception, but they lack the modern features of Ada 2012 and are not available for most embedded targets. And the Ada community doesn't seem to think this is much of a problem (the attitude is often, "well, if you're a commercial project, just pony up the money"). The same goes for the most capable Ada libraries (web framework, IDE, etc.) -- they're all commercial and pretty costly. Not an ideal situation for a small company.

      But yes, Ada is exceptionally fast and pretty safe. There's a lot of potential there, but to be honest my hopes are pinned on Rust at this point.

      1. http://ironsides.martincarlisle.com/

      3 replies →

  • We might be stuck with C for quite a while but then maybe the more interesting question is 'how does this sort of thing get past review?'. It's not hard to imagine how semantic bugs (say, the debian random or even the apple goto bug) can be missed. This one, on the other hand, hits things like 'are the parameters on memcpy sane' or 'is untrusted input sanitized' which you'd think would be on the checklist of a potential reviewer.

  • "Rust has a lot of promise, but even now it keeps changing every other week..."

    A larger problem, in my opinion, is that things like OpenSSL are used (And should be!) from N other languages. As a result, calling into the library requires almost by definition lowest-common denominator interfaces. Which is C.

    C code calling into Rust can certainly be done, but I believe it currently prohibits using much of the standard library, which also removes a lot of the benefits.

    C++ doesn't, I think, have as much of a problem there, but I'm somewhat skeptical of C++ as a silver bullet in this case.

    I don't know about Ada and any other options.

    [ATS, anyone?]

  • Why not write the code in C# (for example) and extract it to $SYSTEM_PROGRAMMING_LANGUAGE? It wouldn't be much different than what Xamarin are doing now for creating iOS and Android apps with C#.

  • >Additionally, although the parsing portions of OpenSSL need not deal with the hardware directly, the crypto portions do. So your memory-safe language needs some first-class escape hatch to unsafe code. A few of them do have this, others not so much.

    For the other points there is some debate, but don't most serious languages have a C FFI?

    • OpenSSL and similar libraries spend most of their time processing short packets. For example, encrypting a few hundred bytes using AES these days should take only a few hundred CPU cycles. This means that the overhead of calling the crypto code should be minimal, preferably 0. This is in part what I meant by "first-class". Perhaps I should have written "zero-overhead" instead.

      I googled around just now for some benchmarks on the overhead of FFIs. I found this project [1] which measures the FFI overhead of a few popular languages. Java and Go do not look competitive there; Lua came surprisingly on top, probably by inlining the call.

      Before you retort with an argument that a few cycles do not matter that much, remember that OpenSSL does not run only in laptops and servers; it runs everywhere. What might be a small speed bump on x86 can be a significant performance problem elsewhere, so this is something that cannot be simply ignored.

      [1] https://github.com/dyu/ffi-overhead

      5 replies →

    • I think that was part of his point, but yeah I don't see why you couldn't do the pure parts in a safer language. FFIs to C are fairly easy to do I think, probably partly because of how simple the calling convention is.

  • I believe Haskell could be up to the job, but I heard that there were some difficulties in guarding against timing attacks. However those could have just been noise. I know that a functional (I believe and haha) operating system was made in Haskell.

    Aren't Operating Systems lower level than OpenSSL?

  • Other than C there is also C++ and D if you don't want to stray to far from C. The problem with C++ is that even though it is possible to adapt to a memory safe programming style with C++ the concepts are not prevalent in the community.

  • What about ADA? GNAT looked pretty good a few years back when I was trying to get into that sort of thing.

  • >This sort of argument is becoming something of a fashion statement amongst some security people.

    Just the ones who don't understand how good API designs can work well to solve these problems, don't worry not all of us are like that :)

  • What you say can easily be disproved, and you are simply asking for too much if you ask for something to be a drop-in replacement for OpenSSL. Some re-architecting is requred simply because of the insecurity of C.

    For example, a shared library that implements SSL would have to be a shim for something living in a separate process space.

    http://hackage.haskell.org/package/tls

    That is a Haskell implementation of TLS. It is written in a language that has very strong guarantees about mutation, and a very powerful type system which can express complex invariants.

    Yes, crypto primitives must be written in a low level language. C is not low level enough to write crypto, neither securely nor fast, so that's not an argument in its favor.

> C and other languages without memory checks are unsuitable for writing secure code

I vehemently disagree. Well-written C is very easy to audit. Much much moreso than languages like C# and Java, where something I could do with 200 lines in a single C source file requires 5 different classes in 5 different files. The problem with C is that a lot of people don't write it well.

Have you looked at the OpenSSL source? It's an ungodly f-cking disaster: it's very very difficult to understand and audit. THAT, I think, is the problem. BIND, the DNS server, used to have huge security issues all the time. They did a ground-up rewrite for version 9, and that by and large solved the problem: you don't read about BIND vulnerabilities that often anymore.

OpenSSL is the new BIND; and we desperately need it to be fixed.

(If I'm wrong about BIND, please correct me, but AFICS the only non-DOS vulnerability they've had since version 9 is CVE-2008-0122)

> but we can plug this seemingly endless source of bugs which has been affecting the Internet since the Morris worm.

If we're playing the blame game, blame the x86 architecture, not the C language. If x86 stacks grew up in memory (that is, from lower to higher addresses), almost all "stack smashing" attacks would be impossible, and a whole lot of big security bugs over the last 20 years could never have happened.

(The SSL bug is not a stack-smashing attack, but several of the exploits leveraged by the Morris worm were)

  • > The problem with C is that a lot of people don't write it well.

    Including people responsible for one of the most important security-related library in the world. No matter how good and careful a programmer is, they are still human and prone to errors. Why not put every chance on our side and use languages (e.g. Rust, Ada, ATS, etc.) that make entire classes of errors impossible? They won't fix all problems, and definitely not those associated with having a bad code base, but it'd still be many times better than hoping people don't screw up with pointers lifetime.

    • > Why not put every chance on our side and use languages (e.g. Rust, Ada, ATS, etc.) that make entire classes of errors impossible?

      I don't think intentionally preventing the programmer from doing certain things the computer is capable of doing on the theory it makes errors impossible makes sense.

      As I've said several times in this thread, somebody has to deal with the pointers and raw memory because that's the way computers work. Using a language where the language runtime itself handles such things only serves to abstract away potential errors from the programmer, and prevents the programmer from doing things in more efficient ways when she wants to. It can also be less performant, since the runtime has to do things in more generic ways than the programmer would.

      > Including people responsible for one of the most important security-related library in the world.

      I think you've hit on a crucial part of the problem: practically every software company on Earth uses OpenSSL, but not many of them pay people to work on it.

        calvinow@Mozart ~/git/openssl $ git log --format='%aE' | grep -Po "@.*" | sort -u - 
        @baggins.org
        @benl-macbookpro3.local
        @chromium.org
        @cloudflare.com
        @comodo.com
        @cryptsoft.com
        @esperi.org.uk
        @fh-muenster.de
        @fifthhorseman.net
        @fsmat.at
        @gmail.com
        @google.com
        @igalia.com
        @infradead.org
        @innominate.com
        @leeds.pl
        @linaro.org
        @links.org
        @neurodiverse.org
        @openssl.org
        @opensslfoundation.com
        @pobox.com
        @roeckx.be
        @secondstryke.com
        @torproject.org
        @trevp.net
        @v3.sk
        @velox.ch
      

      I was very surprised how short that list is. There are a lot of big names that make heavy use of this software that are not on that list.

      5 replies →

    • > Why not put every chance on our side and use languages (e.g. Rust, Ada, ATS, etc.) that make entire classes of errors impossible?

      Bugs will still occur, just in a different way: Java is advocated as being a much "safer" language, but how many exploits have we seen in the JRE? Going to more restrictive, more complex languages in an attempt to fix these problems will only lead to a neverending cycle of increasing ignorance and negligence, combined with even more restrictive languages and complexity. I believe the solution is in better education and diligence, and not technological.

      1 reply →

  • >The problem with C is that a lot of people don't write it well.

    There are languages that make it very very hard to write bad code. Haskell is a good example of where if your program type-checks, there's a high chance it's probably correct.

    C is a language that doesn't offer many advantages but offers very many disadvantages for its weak assurances. Things like the Haskell compiler show that you can get strong typing for free, and there's no longer many excuses to run around with raw pointers except for legacy code.

    • > There are languages that make it very very hard to write bad code. Haskell [...]

      Sure, but how much slower is Haskell than an equivalent implementation in C? Some quick searching suggests numbers like 1000% slower... and no amount of security is worth a 1000% performance hit, let alone a vague "security mistakes are less likely this way" sort of security. Being secure is useless if your code is so slow that you have to run so many servers you don't make a profit.

      Could the Haskell compiler be improved to the point that this isn't a problem? Maybe. Ultimately I think the problem is that Haskell code is very unlike object code, and that makes writing a good compiler very difficult. C is essentially portable assembler; translating it to object code is much more trivial.

      > C is a language that doesn't offer many advantages but offers very many disadvantages for its weak assurances.

      C offers simplicity. Sure, there are some quirks that are complex, but by and large it is one of the simplest languages in existence. Once you understand the syntax, you've essentially learned the language. Contrast that to Java and C#: you are essentially forced by the language to use this gigantic and complicated library all the time. You are also forced to write your code in pre-determined ways, using classes and other OOP abstractions. In C, I don't have to do that: I can write my code in whatever way I feel makes it maximally readable and performant.

      C also offers flexibility: in C, I can make the computer do absolutely anything I want it to in exactly the way I want it to. Maybe I don't like 8-byte pointers on my 64-bit CPU, and I want to implement a linked list allocating nodes from a sparse memory-mapped pool with a known base address using 16-bit indexes which I add to the base to get the actual addresses when manipulating the list? That could be a big win depending on what you're doing, and (correct me if I'm wrong) there is no way to do that in Java or Haskell.

      > there's no longer many excuses to run around with raw pointers except for legacy code.

      If by "raw pointer" you mean haphazardly casting different things to (void * ) or (char * ) and doing arithmetic on them to access members of structures or something, I agree, 99.9% of the time you shouldn't do that.

      Or are you talking about that "auto_ptr" and so-called "smart pointer" stuff in C++? In that case, your definition of "raw pointer" is every pointer I've ever defined in all the C source code I've ever written.

      Pointers exist because that's the way computer hardware works: they will never go away. I'd rather deal with them directly, since it allows me to be clever sometimes and make things faster.

      17 replies →

    • > Haskell is a good example of where if your program type-checks, there's a high chance it's probably correct.

      I really wish people would stop saying this. It's not in the slightest bit true. Making this assertion makes Haskellers seem dangerously naive.

      3 replies →

    • How do I embed your Haskell library in my Lisp program? This is where C shines...it can be used everywhere, including embedded in other programs, now matter what language it's in.

      2 replies →

    • > Haskell is a good example of where if your program type-checks, there's a high chance it's probably correct.

      For a value of 'correct' that includes "uses all available memory and falls over".

  • Agreed. Simple code is easy to understand and just as easy to find any bugs in. After looking at the heartbeat spec and the code, I can already see a simplification that, had it been written this way, would've likely avoided introducing this bug. Instead of allocating memory of a new length, how about just validating the existing message fields as per the spec:

    > The total length of a HeartbeatMessage MUST NOT exceed 2^14 or max_fragment_length when negotiated as defined in [RFC6066].

    > The padding_length MUST be at least 16.

    > The sender of a HeartbeatMessage MUST use a random padding of at least 16 bytes.

    > If the payload_length of a received HeartbeatMessage is too large, the received HeartbeatMessage MUST be discarded silently.

    Then if it's all good, modify the buffer to change its type to heartbeat_response, fill the padding with new random bytes, and send this response. No need to copy the payload (which is where the bug was), no need to allocate more memory.

    (Now I'm sure someone will try to find a flaw in this approach...)

  • My favorite is that the Morris worm dates back to late 1988 when MS was starting the development of OS/2 2.0 and NT. Yea, I am talking about the decision to use a flat address space instead of segmented.

That's why I have high hopes for Rust. We really need to move away from C for critical infrastructure. Perhaps C++ as well, though the latter does have more ways to mitigate certain memory issues.

Incidentally, someone on the mailing list brought up the issue of having a compiler flag to disable bounds checking. However, the Rust authors were strictly against it.

  • I'm excited about Rust for this reason as well, but in practice I find myself thinking a lot about data moving into and out of various C libraries. The great but inevitably imperfect theory is that those call sites are called out explicitly and should be as limited as possible. It works well but isn't a silver bullet. I'm hopeful that as the language ecosystem matures there will be increasingly mature C library wrappers and (even better!) native, memory-safe, Rust replacements for things.

    • >I'm hopeful that as the language ecosystem matures there will be increasingly mature C library wrappers and (even better!) native, memory-safe, Rust replacements for things.

      This is Rust's greatest promise. Not only is writing memory-safe code possible, but it's also possible for Rust to do anything C is currently doing -- from systems, to embedded, to hard real-time, and so one. The promise of Rust cannot be overstated. And having finally grasped the language's pointer semantics, I've started to really appreciate its elegance. It compares very favorably to OCaml and other mixed paradigm languages with strong functional capabilities.

      3 replies →

  • I'd disagree about C++. In my experience, the only things it adds is (1) a false sense of security (since the compiler will flag so many things which are not really big problems, but will happily ignore most overrun issues), (2) lots of complicated ways to screw up, such as not properly allocating/deleting things deep in some templated structure, and (3) interference with checking tools - I got way more false positives from Valgrind in C++ code than in C.

    I wish godspeed to Rust and any other language which doesn't expose the raw underlying computer the way C/C++ does, which is IMO insane for application programming.

    • I'd disagree with your disagreement ;-) C++ has constructs that let you build safer and more secure systems while maintaining good performance. You can write high performance systems without ever touching a raw pointer or doing manual memory management which is something that you can't really do in any other language. Yes, you need to trust the underlying library code which is something you have to do for pretty much any other language.

      In my experience well written C++ has a lot less security issues vs. similar C code. We've had third party security companies audit our code base so my statement has some anecdotal support in real world systems.

      4 replies →

    • To be clear, Rust _does_ expose the raw underlying computer the way C/C++ does, it's just off by default rather than on.

    • >(2) lots of complicated ways to screw up, such as not properly allocating/deleting things deep in some templated structure

      Wow, that sounds scary. Do you have any references or further reading about this?

      2 replies →

"The fact is that no programmer is good enough to write code whic is free from such vulnerabilities."

"...you are kidding yourself if you think you can handle this better than the OpenSSL team."

Well, I can think of at least one example that counters this supposition. As someone points out elsewhere in this thread, BIND is like OpenSSL. And others wrote better alternatives, one of which offered a cash reward for any security holes and has afaik never had a major security flaw.

What baffles me is that no matter how bad OpenSSL is shown to be, it will not shake some programmmers' faith in it.

I wonder if the commercial CA's will see a rise in the sale of certificates because of this.

Sloppy programmer blames language for his mistakes. News at 11.

Nothing in the standard prevents a C compiler + tightly coupled malloc implementation from implementing bounds checks. Out-of-bounds operations result in undefined behavior, and crashing the program is a valid response to undefined behavior. If your malloc implementation cooperates, you can even bounds-check pointer arithmetic without violating calling conventions.

It's quite a shame that there isn't a compiler that does this, and it's a project I've considered spending some time on if I can find a big enough block of that to get a solid start.

  • Unrestricted pointer arithmetic is indeed incompatible with memory safety. You set a pointer to point to one structure, then you change it and it now points to another structure or array. The compiler doesn't know the semantics of your code, so how can it tell if you meant to do that? And malloc/memcpy is way too low to check this stuff. It only sees memory addresses; it has no idea what variables are in them. Tightly coupled would mean passing information like "variable secret_key occupies address such-and-such" into the libc, which does violate POSIX standards, and will result in lots of code breaking. I don't see why we wouldn't just write in C# or Java or Rust, instead of a memory-safe subset of C (and it would have to be a subset).

    Edit: here's one project for making a memory-safe C: http://www.seclab.cs.sunysb.edu/mscc/ . Interesting, but (a) it is a subset of C, (b) it doesn't remove all vulnerabilities, and (c) I still don't grok the advantage of using this over a language actually designed for modern, secure application programming.

    • I'll assume the case you're concerned about is the one legitimately tricky case (where you have an array of structs that include arrays, and you perform arithmetic on a pointer into the inner array), because the other readings I'm coming up with necessarily invoke undefined behavior, either by running off the end of an allocation (what we're checking) or breaking strict aliasing (in which case false positives are OK). Depending on what you do with this pointer (e.g. passing it into a custom memcpy), the compiler may not be able to enforce runtime checks by itself.

      This is where we do need some extra help, in the form of a library that holds state for the compiler so that we don't need to instrument our pointers. Nothing in the C standard prevents the compiler from doing this. The library you pass the pointer into may ignore this information, if it doesn't have the necessary instrumentation, but we at least get the capability.

      Re: other languages, Rust I will grant. It's the only one of those that's compelling for C's use-cases (Java and C# are both entirely unusable for the best uses of C and C++).

    • >You set a pointer to point to one structure, then you change it and it now points to another structure or array. The compiler doesn't know the semantics of your code, so how can it tell if you meant to do that?

      If you changed it via arithmetic or anything other than direct assignment you have violated the standard. Assuming of course that they are part of separate allocations, pointers from one may not interact with pointers from another except through equality testing and assignment.

    • You can do it, although at a considerable performance hit. The usual approach is "fat" pointers that include bounds information. Memory safe pointer arithmetic is achieved by checking that any constructed pointer lies within those bounds, and dying noisily if it does not (alternatively, you can test on dereference).

  • C language environments that worked like this have been commercially available in the past: Saber-C in the '90s, and perhaps earlier, was one example.

    One problem is that the obvious implementation technique is to change the representation of pointers (to include base and bounds information, or a pointer to that), which means that you need to redo a lot of the library as well. (Or convert representations when entering into a stock library routine, and accept that whatever it does with the pointer won't get bounds-checked.)

    But it's certainly doable.

  • I implemented this once in my C interpreter picoc. Users hated it because it also prevented them from doing some crazy C memory access tricks, so I ended up taking it out.

  • If you have a char* buf; block you got from network stack and you have to copy buf[3] bytes from the position buf+15 then the compiler doesn't know what to check for if you don't cross the boundary of that buffer.

  • I think clang's AddressSanitizer gets pretty close to what you want. It misses some tricky cases on use-after-return, but other than that it offers pretty robust memory safety model for bounds checks, double free, and so on.

> This vulnerability is the result of yet another missing bound check. It wasn't discovered by Valgrind or some such tool, since it is not normally triggered - it needs to be triggered maliciously or by a testing protocol which is smart enough to look for it (a very difficult thing to do, as I explained on the original thread).

You could also look at this bug as an input sanitization failure. The author didn't consider what to do when the length field in the header is longer than what comes over the wire (even when writing the code in a secure language, this case should be handled somehow, maybe by logging or dropping the packet).

  • The defined behaviour would be to discard the packet. In a secure language, the buffer would have had a "length" property, and the code would have crashed when a read beyond the buffer's end was attempted. But in C, buffers are just pointers, so there is fundamentally nothing wrong with reading beyond the end of the buffer. So instead of a crash, we get silent memory exposure.

  • Isn't this basically the whole point of QuickCheck-like testing frameworks? They're basically a specification that is attempted to be falsified in some way by a fuzzer. I don't see why most C projects couldn't be doing this.

    • I think they don't do this because it's not a widely known testing method, and it's kind of tricky to implement these tests correctly.

      But then again, with some dedication, quickcheck-like testing can do a huge amount of work. At work I've implemented these tests for the entire low-level IO framework for our servers and these few tests are a pure meat grinder. It triggered one severe bug that would have downed production in the middle of the night and then some more.

Speaking of proofs, how about we write security critical code in haskell? You need a very simple runtime, but beyond that it would work pretty much wherever.

Most memory-related bugs are automatically eliminated, and security proofs are easier.

Agree a bazillion times.

Go or Java on top. Coding in C is like juggling chainsaws to say you can juggle them. C is certainly better than old school Fortran where memory management wasn't developed until later, but platforms like Erlang, Go and JRuby are really hard to beat.

The only problem is convincing people to migrate to different tools and transition codebases to another language. It would take a large project like FreeBSD, LLVM or the Linux kernel to move the needle.

  • Fortran was not meant to be a systems programming language. The fact that it did not have memory management does actually make sense in scientific applications, where you typically know your problem size in advance or can just recompile before a day long computation.

Is anyone working on an OpenSSL port in rust, which lacks the memory vulnerabilities of C?

we can plug this seemingly endless source of bugs which has been affecting the Internet since the Morris worm. It has now cost us a two-year window in which 70% of our internet traffic was potentially exposed. It will cost us more before we manage to end it.

Could one make a new kind of OS where C programs are compiled to some intermediate representation then when run this is JIT compiled within a managed hypervisor sandbox? Could Chrome OS become something like this? Does this already exist? MS had a managed code OS called Singularity.

> My opinion, then and now, is that C and other languages without memory checks are unsuitable for writing secure code.

I think they can be used to write secure code, but it has to be done carefully, with really thorough checks and unit tests, and a constant awareness of the vulnerabilities.

Everything I've heard about OpenSSL so far, suggests it was done by a bunch of cowboys who don't care about code quality. Those people shouldn't be writing C, but a safer language.

You make good points.

However, qmail is written in C and has a very good record. So I would disagree with The fact is that no programmer is good enough to write code which is free from such vulnerabilities.

There seem to be at least two programmers who are capable of that.

Java, yes, hmmm.. Oh wait, but Java VM is written in C and is a host to some of the worst web browser zero days we know of.

Fundamentally, I think we're going to have to give up on security and start handing out drivers licenses to anyone who wants to use the internet.

If that would work Virtual Machines and runtimes wouldn't have vulnerabilities.

So uhm. Yeah, that doesn't work either.

Edit: Btw since HN has this obessions with Tarsnap, it's written in C btw. So you should stop obessing about it and downvote me some more.

  • This argument came up in the thread from a few years ago. It is quite wrong-headed. I would like to give a clear answer to it:

    Virtual machines and runtimes may be vulnerable to malicious CODE. That's bad. Programs written in unmanaged languages are vulnerable to malicious DATA. That's horrible and unmitigatable.

    Vulns to malicious code are bad, but they may be mitigated by not running untrusted code (hard, but doable in contexts of high security). They are also mitigated by the fact that the runtime or VM is a small piece of code which may even be amenable to formal verification.

    Vulns to malicious data, or malicious connection patterns, are impossible to avoid. You can't accept only trusted data in anything user-facing. Also, these vulnerabilities are spread through billions of lines of application and OS code, as opposed to core runtime/VM.

    •   Virtual machines and runtimes may be vulnerable to malicious CODE. That's bad. 
        Programs written in unmanaged languages are vulnerable to malicious DATA. 
      

      Not exatly true. You can still write code vulnerable to input (data) in a "secure" language by accident. C is just especially vulnerable to buffer stuff.

    • What? Sorry, that doesn't make the least amount of sense.

      If you have a VM your code is DATA! You can have say format string bugs in managed languages too, see for example: http://blog.stalkr.net/2013/06/golang-heap-corruption-during...

      I'm sorry, but you're agrument here is plain bullshit.

      Using your logic the whole kernel/browsers etc should be written in a managed language.

      A browser is basically a VM for webpages though. I'd bet that chrome/firefox/IE have more severe vulnerabilities than openssl per time though.

      I just... You can't argue with bullshit sorry.

      Your might get less buffer overruns okay, but is it more secure in the end without any doubt?

      I doubt it.

      2 replies →

  • So reducing the attack surface isn't a laudable goal in your book, because hey the VM itself can have vulnerabilities so there isn't a point? I think the point is that programmers will always make these mistakes and we should limit as much as possible the type of unsafe code that is written to as small an attack vector as possible. You're never going to eliminate vulnerabilities, but we sure can try and reduce the likelihood of them occurring. If there is some objective measurement to be made that says this isn't the case, i.e. the number of JVM vulnerabilities like this outstrip or is on par with client side vulnerabilities that occur in purely C/C++ applications I would love to see it.

    Ultimately, I think the better answer will ultimately be a language that inherently provides the primitives for safe memory management but that's low-level and highly peformant, i.e. Rust or something like it.

    • There's a problem with your logic here.

      You're not neccesarily reducing the attack surface. You're adding complexity. While you might reduce the attack surface on low level bugs. You might open yourself to new bug classes.

      Downvote me however you guys want. It's just not that easy.

      If you could eliminiate the common cold by killing the guy with the running nose, don't you think someone would have done it?

  • In keeping with the tradition of bad car analogies, that's like saying "Driving cars with automatic traction control won't make accidents go away, so automatic traction control is pointless".

    Languages with bounds checks on array accesses don't solve everything, but that doesn't mean that they don't work. They do remove entire classes of silent failures that can potentially slip through the cracks in C-like languages. VMs aren't needed for this -- most of the strongly typed functional languages, D, Go, Rust, and others all compile down to native machine code.

    Careful API design, discipline, and good coding in C can also mitigate this sort of problem manually, although (like most things in C), it's extra work, and needs careful thought to ensure correctness.

    • Do you know of any controlled experiments to test the safety claims for automatic traction control? People used to say similar things about ABS. Then the experiments were done, it turned out to be pointless or possibly dangerous, and people started talking about traction control instead.

      Automatic bounds checking could well fail the same way that ABS did: programmers won't bother defining a packet data type, because the compiler will catch any mistakes they make fiddling with arrays. So, like drivers with ABS, programmers with ABC would go faster, but they wouldn't be any safer.

      2 replies →

  • VMs generally do not have this type of vulnerability (buffer overrun).

    Also, most vulnerabilities in (e.g.) the JVM can only be exploited by running malicious code inside the VM. Here, the attacker is supplying data used by OpenSSL, but is not able to supply arbitrary code.