← Back to context

Comment by jacquesm

2 years ago

We're very much in agreement.

The whole 'null pointer style strings' makes no sense, I think they want to say 'nul terminated'. But fine.

Your examples are excellent, let me add a few more:

Big endian? Little endian? Do we count characters or bytes? Who owns the bloody thing? Can they be modified in place? Are they in ROM or RAM? Automatic? Static? Can they be transmitted over a network 'as is' or do they need to be sent via some serialization mechanism? What about storing them on disk? And can they then be retrieved on different architectures?

The problem really is that C more or less requires you to really know what you're doing with your data and that's impossible in a networked world because your toy library ends up integrated into something else and then that something else gets connected to the internet and suddenly all those negative test cases that you never thought of are potential security issues. So any simplistic view of string handling will end up with a broken implementation regardless of how well it worked in its initial target environment.

C's solution is simple: take the simplest possible representation and use that, pass responsibility back to the programmer for dealing with all of the edge cases. The problem is that nobody does and even those that try tend to get it subtly wrong several times across a codebase of any magnitude.

It's a nasty little problem and it will result in security issues for decades to come. There are plenty of managed languages, I had some hope (as a seasoned C programmer) that instead of this Cambrian explosion of programming languages that we'd have some kind of convergence so that it becomes easier, not harder to pick a winner and establish some best practices. But it seems as though cooperation is rare, much more common is the mode where a defect in one language or eco system results in a completely new language that solves that one problem in some way (sometimes quite convoluted) at the expense of introducing a whole raft of new problems. Besides the fractioning of mindshare.

It's not a hypothesis, the thing was already implemented many times in C, C++ and other languages and used for ages especially for networked code, because C "there's no length" approach is a guaranteed vulnerability.

  • It's not a guaranteed vulnerability, it's a potential vulnerability.

    Guaranteed doesn't mean "this will probably happen", it means "this will definitely happen".

    The "no length approach" can probably result in a vulnerability. It won't definitely result in a vulnerability.

    I mean, come one, if it was a guaranteed vulnerability, almost nothing on the internet would work because they all have, somewhere down the line, a dependency on a nul-terminated string.

    I mean, do you think that nginx (https://github.com/nginx/nginx/blob/master/src/core/ngx_stri...) is getting exploited millions of times per hour because they have a few uses for nul-terminated strings?

  • Which C compilers are those then?

    Also, you keep writing 'null pointer' and 'null', there is a pretty big difference between 'null' and 'nul' and in the context of talking about language implementation details such little things matter a lot. You say a lot of stuff with great authority that simply doesn't match my experience (as a C programmer of many decades) and while I'm all open to being convinced otherwise you will have to show some references and examples.