Comment by jcalvinowens
12 years ago
> C and other languages without memory checks are unsuitable for writing secure code
I vehemently disagree. Well-written C is very easy to audit. Much much moreso than languages like C# and Java, where something I could do with 200 lines in a single C source file requires 5 different classes in 5 different files. The problem with C is that a lot of people don't write it well.
Have you looked at the OpenSSL source? It's an ungodly f-cking disaster: it's very very difficult to understand and audit. THAT, I think, is the problem. BIND, the DNS server, used to have huge security issues all the time. They did a ground-up rewrite for version 9, and that by and large solved the problem: you don't read about BIND vulnerabilities that often anymore.
OpenSSL is the new BIND; and we desperately need it to be fixed.
(If I'm wrong about BIND, please correct me, but AFICS the only non-DOS vulnerability they've had since version 9 is CVE-2008-0122)
> but we can plug this seemingly endless source of bugs which has been affecting the Internet since the Morris worm.
If we're playing the blame game, blame the x86 architecture, not the C language. If x86 stacks grew up in memory (that is, from lower to higher addresses), almost all "stack smashing" attacks would be impossible, and a whole lot of big security bugs over the last 20 years could never have happened.
(The SSL bug is not a stack-smashing attack, but several of the exploits leveraged by the Morris worm were)
> The problem with C is that a lot of people don't write it well.
Including people responsible for one of the most important security-related library in the world. No matter how good and careful a programmer is, they are still human and prone to errors. Why not put every chance on our side and use languages (e.g. Rust, Ada, ATS, etc.) that make entire classes of errors impossible? They won't fix all problems, and definitely not those associated with having a bad code base, but it'd still be many times better than hoping people don't screw up with pointers lifetime.
> Why not put every chance on our side and use languages (e.g. Rust, Ada, ATS, etc.) that make entire classes of errors impossible?
I don't think intentionally preventing the programmer from doing certain things the computer is capable of doing on the theory it makes errors impossible makes sense.
As I've said several times in this thread, somebody has to deal with the pointers and raw memory because that's the way computers work. Using a language where the language runtime itself handles such things only serves to abstract away potential errors from the programmer, and prevents the programmer from doing things in more efficient ways when she wants to. It can also be less performant, since the runtime has to do things in more generic ways than the programmer would.
> Including people responsible for one of the most important security-related library in the world.
I think you've hit on a crucial part of the problem: practically every software company on Earth uses OpenSSL, but not many of them pay people to work on it.
I was very surprised how short that list is. There are a lot of big names that make heavy use of this software that are not on that list.
With arguments like this, we'd all be back in the days of non-structured programming languages (enjoy writing all your crypto in MUMPS). Every modern language, including C, restricts itself in some way in order to make programs more predictable and errors less likely. Some simply impose more restrictions than others, though these restrictions can actually make programs more efficient (see, for instance, alias analysis in Fortran vs. alias analysis in C).
All three of the languages listed previously (Rust, Ada, ATS) are systems programming languages with the capability of manipulating pointers and raw memory (though I don't personally have any experience with the latter two). What they have in common is that they provide compile-time guarantees that certain aspects of your code are correct: for example, the guarantee that you never attempt to access freed memory. These are static checks that require no runtime to perform, and impose no overhead on running code.
4 replies →
> Why not put every chance on our side and use languages (e.g. Rust, Ada, ATS, etc.) that make entire classes of errors impossible?
Bugs will still occur, just in a different way: Java is advocated as being a much "safer" language, but how many exploits have we seen in the JRE? Going to more restrictive, more complex languages in an attempt to fix these problems will only lead to a neverending cycle of increasing ignorance and negligence, combined with even more restrictive languages and complexity. I believe the solution is in better education and diligence, and not technological.
> Java is advocated as being a much "safer" language, but how many exploits have we seen in the JRE?
Very few. I don't think I can remember ever seeing an advisory for Java's SSL implementation.
Yes, bugs are possible in all languages, but that doesn't mean there's no difference between languages. I'm reminded of Asimov: "When people thought the earth was flat, they were wrong. When people thought the earth was spherical, they were wrong. But if you think that thinking the earth is spherical is just as wrong as thinking the earth is flat, then your view is wronger than both of them put together."
(There are a large number of bugs in the browser plugin used for java applets, but they have no relation to the JRE itself)
>The problem with C is that a lot of people don't write it well.
There are languages that make it very very hard to write bad code. Haskell is a good example of where if your program type-checks, there's a high chance it's probably correct.
C is a language that doesn't offer many advantages but offers very many disadvantages for its weak assurances. Things like the Haskell compiler show that you can get strong typing for free, and there's no longer many excuses to run around with raw pointers except for legacy code.
> There are languages that make it very very hard to write bad code. Haskell [...]
Sure, but how much slower is Haskell than an equivalent implementation in C? Some quick searching suggests numbers like 1000% slower... and no amount of security is worth a 1000% performance hit, let alone a vague "security mistakes are less likely this way" sort of security. Being secure is useless if your code is so slow that you have to run so many servers you don't make a profit.
Could the Haskell compiler be improved to the point that this isn't a problem? Maybe. Ultimately I think the problem is that Haskell code is very unlike object code, and that makes writing a good compiler very difficult. C is essentially portable assembler; translating it to object code is much more trivial.
> C is a language that doesn't offer many advantages but offers very many disadvantages for its weak assurances.
C offers simplicity. Sure, there are some quirks that are complex, but by and large it is one of the simplest languages in existence. Once you understand the syntax, you've essentially learned the language. Contrast that to Java and C#: you are essentially forced by the language to use this gigantic and complicated library all the time. You are also forced to write your code in pre-determined ways, using classes and other OOP abstractions. In C, I don't have to do that: I can write my code in whatever way I feel makes it maximally readable and performant.
C also offers flexibility: in C, I can make the computer do absolutely anything I want it to in exactly the way I want it to. Maybe I don't like 8-byte pointers on my 64-bit CPU, and I want to implement a linked list allocating nodes from a sparse memory-mapped pool with a known base address using 16-bit indexes which I add to the base to get the actual addresses when manipulating the list? That could be a big win depending on what you're doing, and (correct me if I'm wrong) there is no way to do that in Java or Haskell.
> there's no longer many excuses to run around with raw pointers except for legacy code.
If by "raw pointer" you mean haphazardly casting different things to (void * ) or (char * ) and doing arithmetic on them to access members of structures or something, I agree, 99.9% of the time you shouldn't do that.
Or are you talking about that "auto_ptr" and so-called "smart pointer" stuff in C++? In that case, your definition of "raw pointer" is every pointer I've ever defined in all the C source code I've ever written.
Pointers exist because that's the way computer hardware works: they will never go away. I'd rather deal with them directly, since it allows me to be clever sometimes and make things faster.
>Sure, but how much slower is Haskell than an equivalent implementation in C? Some quick searching suggests numbers like 1000% slower...
With things like stream fusion (http://research.microsoft.com/en-us/um/people/simonpj/papers...) , which I imagine would capture a lot of crypto calls, GHC can generate some very performant code (paper contains examples of hand-written C code being beat by Haskell code, and the C code is far from naive).
There are a lot of tricks at your disposal when you know more about the state of the code. And compilers are usually better than humans in this regard.
2 replies →
>no amount of security is worth a 1000% performance hit
Yes it is. Most of the applications I use take roughly 0% of my processor's capacity. I can spare a multiple like that outside of a few hot loops.
Saying 10x slowdown isn't worth it is like saying no one would ever compute on a phone.
Also random BS benchmark http://benchmarksgame.alioth.debian.org/u32/benchmark.php?te... says haskell is at least half as fast as C.
11 replies →
> Sure, but how much slower is Haskell than an equivalent implementation in C?
Writing Haskell to generate proven-correct C is an approach that is known to work.
C offers simplicity. Sure, there are some quirks that are complex, but by and large it is one of the simplest languages in existence.
People keep saying this, and it keeps not being true.
C has something like over 200 undefined and implementation defined behaviors. That alone makes it a minefield of complexity in practice.
> Haskell is a good example of where if your program type-checks, there's a high chance it's probably correct.
I really wish people would stop saying this. It's not in the slightest bit true. Making this assertion makes Haskellers seem dangerously naive.
This is especially true in the field of crypto, where timing attacks are a major issue. Knowing that your program will produce the correct result isn't enough, you need to know that the amount of time taken to compute that result doesn't leak information, and I don't think Haskell provides any way to ensure this.
1 reply →
it's just probably correct ;)
How do I embed your Haskell library in my Lisp program? This is where C shines...it can be used everywhere, including embedded in other programs, now matter what language it's in.
Like this: http://www.haskell.org/ghc/docs/latest/html/users_guide/ffi-.... Haskell's FFI works both ways.
(It's not free: you need to link in the entire Haskell runtime system, which is not small. But you can absolutely do it.)
1 reply →
> Haskell is a good example of where if your program type-checks, there's a high chance it's probably correct.
For a value of 'correct' that includes "uses all available memory and falls over".
Agreed. Simple code is easy to understand and just as easy to find any bugs in. After looking at the heartbeat spec and the code, I can already see a simplification that, had it been written this way, would've likely avoided introducing this bug. Instead of allocating memory of a new length, how about just validating the existing message fields as per the spec:
> The total length of a HeartbeatMessage MUST NOT exceed 2^14 or max_fragment_length when negotiated as defined in [RFC6066].
> The padding_length MUST be at least 16.
> The sender of a HeartbeatMessage MUST use a random padding of at least 16 bytes.
> If the payload_length of a received HeartbeatMessage is too large, the received HeartbeatMessage MUST be discarded silently.
Then if it's all good, modify the buffer to change its type to heartbeat_response, fill the padding with new random bytes, and send this response. No need to copy the payload (which is where the bug was), no need to allocate more memory.
(Now I'm sure someone will try to find a flaw in this approach...)
My favorite is that the Morris worm dates back to late 1988 when MS was starting the development of OS/2 2.0 and NT. Yea, I am talking about the decision to use a flat address space instead of segmented.