← Back to context

Comment by sanxiyn

2 years ago

He is not wrong, is he? If you are using null terminated strings that's the thing you need to fix.

I still support this addition. If you are doing methamphetamine with needle sharing you should stop methamphetamine, but distributing clean needles is still an improvement.

He's not wrong. The main reason to have these functions is that other implementations have them, and programs are using them, and having to define those functions themselves when ported to glibc.

One benefit of defining strlcpy yourself is that you can define it as a macro that expands to an open-coded call to snprintf, and then that is diagnosed by GCC; you may get static warnings about possible truncation. (I suspect GCC might not yet be analyzing strlcpy/strlcat calls, but that could change.)

The functions silently discard data in order to achieve memory safety. Historically, that has been viewed as acceptable in C coding culture. There are situations in which that is okay, like truncating some unimportant log message to "only" 1024 characters.

Truncating can cause an exploitable security hole; like some syntax is truncated so that its closing brace is missing, and the attacker is able to somehow complete it maliciously.

Even when arbitrary limits are acceptable, silently enforcing them in a low-level copying function may not be the best place in the program. If the truncation is caused by some excessively long input, maybe that input should be validated close to where it comes into the program, and rejected. E.g. don't let the user input some 500 character field, pretend you're saving it and then have them find out the next day that only 255 of it got saved.

Even if in my program I find it useful to have a truncating copying function, I don't necessarily want it to be silent when truncation occurs. Maybe in that particular program, I want to abort the program with a diagnostic message. I can then pass large texts in the unit and integration tests, to find the places in the program that have inflexible text handling, but are being reached by unchecked large inputs.

  • Example:

      #include <stdio.h>
      #include <string.h>
    
      #define strlcpy(dst, src, size) ((size_t) snprintf(dst, size, "%s", src))
    
      size_t (strlcpy)(char *dst, const char *src, size_t size)
      {
        return strlcpy(dst, src, size);
      }
    
      int main(void)
      {
        char littlebuf[8];
        strlcpy(littlebuf, "Supercalifragilisticexpealidocious", sizeof littlebuf);
        return 0;
      }
    
    
      strlcpy.c: In function ‘main’:
      strlcpy.c:4:63: warning: ‘%s’ directive output truncated writing 34 bytes into a region of size 8 [-Wformat-truncation=]
       #define strlcpy(dst, src, size) ((size_t) snprintf(dst, size, "%s", src))
                                                                   ^
      strlcpy.c:14:22:
         strlcpy(littlebuf, "Supercalifragilisticexpealidocious", sizeof littlebuf);
                          ~
      strlcpy.c:14:3: note: in expansion of macro ‘strlcpy’
         strlcpy(littlebuf, "Supercalifragilisticexpealidocious", sizeof littlebuf);
       ^~~~~~~
      strlcpy.c:4:34: note: ‘snprintf’ output 35 bytes into a destination of size 8
       #define strlcpy(dst, src, size) ((size_t) snprintf(dst, size, "%s", src))
                                       ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      strlcpy.c:14:3: note: in expansion of macro ‘strlcpy’
         strlcpy(littlebuf, "Supercalifragilisticexpealidocious", sizeof littlebuf);
         ^~~~~~~
    

    If glibc doesn't do something in the header file such that we get similar diagnostics for its strlcpy, we can make the argument that this is detrimental to the program.

  • There is a hierarchy of bugs involved here. Memory safety is a much more serious class of problem. Obstinately refusing to improve the status quo because it doesn't solve all problems is just plain bad engineering. Doubly so in this case where there exist the "n" variants of string functions that are massive foot guns.

Yes, he’s wrong. To apply your metaphor: improvements to the mess that is string handling in C are still an improvement, even if they don’t solve the underlying problem.

Well, the wider problem then is using C.

  • Pretty much all operating system APIs use C-style zero-terminated strings. So while C may be historically responsible for the problem, not using C doesn't help much if you need to talk to OS APIs.

    • not using C doesn't help much if you need to talk to OS APIs

      This means cdecl, stdcall or whatever modern ABIs OSes use, not C. Many languages and runtimes can call APIs and DLLs, though you may rightfully argue that their FFI or wrappers were likely compiled from C using the same ABI flags. But ABI is no magic, just a well-defined set of conventions.

      And then, no one prohibits to use length-aware strings and either have safety null at the end or only copy to null-terminated before a call. Most OS calls are usually io-bound and incomparably heavy anyway.

      34 replies →

    • All non-dynamic string representations give rise to the situations where programmers need to combine strings that don't fit into the destination.

      Whether null-terminated or not, dynamic strings that solve the problem of being able to add two strings together without worrying whether the destination buffer is large enough (trading that problem for DoS concerns when a malicious agent may feed a huge input to the program).

  • As a user posting from a Linux machine, I disagree. Though it seems the "don't use C" crowd often delegate the important decisions to somewheres else.

    I guess the answer is "some people's C is good enough, but not yours"

  • C the needle contaminated now often with deadly RCE virus. Historically it was used to inject life into the first bytes of the twisted self perpetuating bootstrapping chain of an eco system dominating today the planet and the space around it.

  • All processors are C VMs at the end of the day. They are designed for it, and it's a great language to access raw hardware and raw hardware performance.

    I still fail to label C as evil.

    P.S.: Don't start with all memory management and related stuff. We have solutions for these everywhere, incl., but not limited to GCs, Rust, etc. Their existence do not invalidate C, and we don't need to abandon it. Horses for courses.

    • > All processors are C VMs at the end of the day.

      That would be a poor argument back in the 80s; and is increasingly wrong for modern processors. Compiler intrinsics can paper-over some of the conceptual gap, but dropping down to inline assembly can't be entirely eliminated (even if it's relegated to core libraries). Lots of C code relies on certain patterns compiling down to specific instructions, e.g. for vectorising; since C itself has no concept of such things. C is based around a 1D memory model which has no concept of cache hierarchies. C has no representation of branch prediction, out-of-order instructions, or pipelines; let alone hyperthreading or multi-core programming.

      After all, if processors were "C VMs", then GCC/LLVM/etc. wouldn't be such herculean feats of engineering!

      3 replies →

    • This is backwards. C was conceived as a way to do the things programmers were already doing in assembler, but with high(er) level language conveniences. In turn , the things they were doing in assembler were done to efficiently use the "VM" their code was executed on.

      1 reply →

What a crazy metaphor! You're equating using zero terminated strings in C to doing drugs.

  • What's up with people seeing an analogy and going "you can't equate those two things"? Analogies aren't equating things

    • Analogies are great since they talk about how things are the same, and just as terrible because they talk about things that are different.

      But seriously it’s sometimes hard to slice out what level of similarity is implied. Obvious things are somewhat less obvious to others sometimes

  • I feel like the success rate of getting someone off of null terminated strings is probably lower than most rehabilitation programs.

    • We can't entirely because of the C ABI but apart from that it's as simple as not using C which is not too difficult. C is not a popular language these days.

      5 replies →

  • I mean it’s a wash, on the one hand zero terminated strings have done untold amounts of damage[0] and are impossible to extirpate once they’re in, on the other hand the nazis were methed (and coked later on) up their eyeballs.

    [0] and not just in C itself, unexpected truncation through FFI is an issue which regularly pops up

    • odbc defines multilingual interface that can accept both null terminated and length bounded strings by using NTS sentinel value for null terminated string length.