← Back to context

Comment by drv

12 years ago

From a quick reading of the TLS heartbeat RFC and the patched code, here's my understanding of the cause of the bug.

TLS heartbeat consists of a request packet including a payload; the other side reads and sends a response containing the same payload (plus some other padding).

In the code that handles TLS heartbeat requests, the payload size is read from the packet controlled by the attacker:

  n2s(p, payload);
  pl = p;

Here, p is a pointer to the request packet, and payload is the expected length of the payload (read as a 16-bit short integer: this is the origin of the 64K limit per request).

pl is the pointer to the actual payload in the request packet.

Then the response packet is constructed:

  /* Enter response type, length and copy payload */
  *bp++ = TLS1_HB_RESPONSE;
  s2n(payload, bp);
  memcpy(bp, pl, payload);

The payload length is stored into the destination packet, and then the payload is copied from the source packet pl to the destination packet bp.

The bug is that the payload length is never actually checked against the size of the request packet. Therefore, the memcpy() can read arbitrary data beyond the storage location of the request by sending an arbitrary payload length (up to 64K) and an undersized payload.

I find it hard to believe that the OpenSSL code does not have any better abstraction for handling streams of bytes; if the packets were represented as a (pointer, length) pair with simple wrapper functions to copy from one stream to another, this bug could have been avoided. C makes this sort of bug easy to write, but careful API design would make it much harder to do by accident.

It is indeed astonishing how simple-minded this bug is. But these bugs come in all levels of complexity, from simple overstuffed buffers to logical ping-pong that hurts your brain when you try to follow it. We need to get rid of them once and for all. If the whole world can't use a certain tool effectively, then the whole world isn't broken; the tool is bad.

  • Machine level languages like C and C++ aren't necessarily bad tools, even in their current states. However, I agree that they might be bad tools for the purpose of writing security libraries.

    • There are not bad tools, but not the best either. If you spend mental stamina on trivial things, you have less for the important ones, the ones a compiler cannot check.

      This kind of tool (SSL) should be written in ada or haskell.

      5 replies →

I've felt that C makes this code easy to write because it makes doing the right thing hard. What you are describing is just a lot of work in C, compared to a language with something akin to Java's generics, which are in turn an afterthought in the ML family of languages. What we're asking for is not that complicated from a PL standpoint. A generic streams library?

Economics plays an invisible part here. Someone writing a library has a limited amount of time to implement some set of features, and to balance that against other needs, like making the code "clean"/pretty and secure. In this case, pretty code and secure code are akin. Consumers would likewise have to balance out feature needs with how likely the code is going to explode. What it comes down to is that you aren't likely to have secure, stable code in a language that doesn't inherently encourage it.

It starts to be clearer then, that the more modern, "prettier" languages offer material benefits in their efforts to be more elegant.

  • That's what I like about Ruby. ;)

    Even in C, Go or Python, I column align any text that is remotely similar, so differences are obvious.

    Clean code might be extra work but the net work (maintenance) should amortize less. Reducing cognitive load for large supportable production codebase cannot be underscored enough.

    • There's no "just use X" type of answer in security.

      Sep 2013

      "All versions of the open source Ruby on Rails Web application framework released in the past six years have a critical vulnerability that an attacker could exploit to execute arbitrary code, steal information from databases and crash servers."

      https://groups.google.com/forum/#!topic/rubyonrails-security...

      Nov 2013

      "A lingering security issue in Ruby on Rails..."

      http://threatpost.com/ruby-on-rails-cookiestore-vulnerabilit...

      Dec 2013

      "Ruby on Rails security updates patch XSS, DoS vulnerabilities"

      http://www.infoworld.com/d/security/ruby-rails-security-upda...

      19 replies →

    • you get it but the opinion of "C and C++ are fine for SSL, the OpenSSL guys just screwed up" is plain wrong.

      This is a question of priorities. We have speed and security. If you chose C/C++ (non-existent automated checking of memory access) you are chosing speed first, security second.

      If security is critical then you need to chose a language that makes array out of bounds access well nigh impossible. This is an easy problem -- we have languages that will give this to us.

      What percentage of exploits in the wild come from array (and pointer) access out of bounds? I'd venture to say it is above 50%.

      Rather than have programmers everywhere "try hard to be careful" writing this code, let them use a safer language and have a few really smart folk work on optimizing the compiler for said language to make the safety checks faster (e.g. removing provably unnecessary/redundant checks).

      People think that chosing C/C++ has a better business case (i.e. better performance / scaling) because "being really careful" works most of the time. The problem is when heartbleed (or the next array out of bounds access bug) hits the the business case's ROI no longer looks so much better than the safer path.

      A better language won't eliminate all security holes but it can eliminate a huge class of them and allow engineers to focus the energy they used to spend on "being really careful about array access and pointers" on other tasks (be they security, performance or feature related).

      EDIT: stating the obvious .. there are good uses for C style languages but writing large bodies of software that needs to be resistant to malicious user attacks is not one of them.

Thanks for this. How is this reading arbitrary memory locations though? Isn't this always reading what is near the pl? As in, can you really scan the entire process's memory range this way or just a small subset where malloc (or the stack, whichever this is) places pl?

  • The latter, and AFAIK the buffer doesn't get reallocated on every connection, so it should be unlikely that any private keys actually get dumped. However, I could be missing a way to exploit it.

    • Reading between the lines in the announcement it sounds like dropping and reconnecting may cause it to read memory freed up from a prior connection. It may "just" be a matter of keep trying or it may be a matter of opening lots of connections to consume resources dropping them all then connecting and seeing what was left on the beach after the tide went out.

      BTW Amazon AWS/ELM is vulnerable, confirmed publically by their support.

      6 replies →

    • I successfully obtained the private key for my local Apache install this way once, though I'm having trouble getting anything reliable.

      1 reply →

This reminds me of what another programmer told me a long time ago when we were discussing C; "The problem with C is that people make terrible memory managers.". So true.

I agree that this seems like an abstraction for this is missing, but I always have the feeling that what you're doing in covering holes in a leaking dam you might get good at it, but you'll always have leaks.

I have always detested C (also C++) because it's so unreadable... the snippets of code you cite are just so dense ie. a function like n2s() gives pretty much no indication of what it does to a casual reader. Just reading the RFC (it is pretty much written in a C style) gives me the creeps.

The RFC doesn't mention why there has to be a payload, why the payload has to be random size, why they are doing an echo of this payload, why there has to be a padding after the payload. If this data is just a regular C struct like the RFC makes it out to be (I didn't know you could have a struct with a variable size, but apparently the fields are really pointers or it's just a mental model and not a real struct).

Apparently the purpose of the payload is path MTU discovery. Something that is supposed to happen at the IP layer, but I don't know enough about datagram packets. I guess an application may want to know about the MTU as well...

I'm not here to point fingers, I'm just saying C is a nightmare to me and a reason for me to never be involved with system programming or something like drafting RFC's ;-).

But if one can argue that C is a bad choice for writing this stuff, then that is not an isolated thing. "C" is also the language of the RFCs. "C" is also the mindset of the people doing that writing. After all, the language you speak determines how you think. It introduces concepts that become part of your mental models. I could give many examples, but that's not really the point.

And it's about style and what you give attention to. To me, that RFC is a real bad document. It starts to explain requirements to exceptional scenario's (like when the payload is too big) before even having introduced and explained the main concepts and the how and why's.

So while you may argue that this is a C problem and not a protocol problem, it is really all related.

And you may also say, in response to someone blaming these coders, that blame is inappropriate (and it is) because these are volunteers and they are donating their free time to something to find valuable, the whole distribution and burden of responsibility is, naturally, also part of the culture and how people self-organize and so on.

As someone else explained (https://news.ycombinator.com/item?id=7558394) the protocol is real bad but it is the result of more or less political limitations around submitting RFCs for approval. There is no reason for the payload in TLS (but apparently there is in DTLS) but my point is simply this:

If you are doing inelegant design this will spill over into inelegant implementation. And you're bound to end up with flaws.

Rather than trying to isolate the fault here or there, I would say this is a much larger cultural thing to become aware of.