← Back to context

Comment by avar

2 years ago

As a bit of historical context:

"This is horribly inefficient BSD crap. Using these function only leads to other errors. Correct string handling means that you always know how long your strings are and therefore you can you memcpy (instead of strcpy).

Beside, those who are using strcat or variants deserved to be punished."

- Ulrich Drepper, around 23 years ago: https://sourceware.org/legacy-ml/libc-alpha/2000-08/msg00053...

He is not wrong, is he? If you are using null terminated strings that's the thing you need to fix.

I still support this addition. If you are doing methamphetamine with needle sharing you should stop methamphetamine, but distributing clean needles is still an improvement.

  • He's not wrong. The main reason to have these functions is that other implementations have them, and programs are using them, and having to define those functions themselves when ported to glibc.

    One benefit of defining strlcpy yourself is that you can define it as a macro that expands to an open-coded call to snprintf, and then that is diagnosed by GCC; you may get static warnings about possible truncation. (I suspect GCC might not yet be analyzing strlcpy/strlcat calls, but that could change.)

    The functions silently discard data in order to achieve memory safety. Historically, that has been viewed as acceptable in C coding culture. There are situations in which that is okay, like truncating some unimportant log message to "only" 1024 characters.

    Truncating can cause an exploitable security hole; like some syntax is truncated so that its closing brace is missing, and the attacker is able to somehow complete it maliciously.

    Even when arbitrary limits are acceptable, silently enforcing them in a low-level copying function may not be the best place in the program. If the truncation is caused by some excessively long input, maybe that input should be validated close to where it comes into the program, and rejected. E.g. don't let the user input some 500 character field, pretend you're saving it and then have them find out the next day that only 255 of it got saved.

    Even if in my program I find it useful to have a truncating copying function, I don't necessarily want it to be silent when truncation occurs. Maybe in that particular program, I want to abort the program with a diagnostic message. I can then pass large texts in the unit and integration tests, to find the places in the program that have inflexible text handling, but are being reached by unchecked large inputs.

    • Example:

        #include <stdio.h>
        #include <string.h>
      
        #define strlcpy(dst, src, size) ((size_t) snprintf(dst, size, "%s", src))
      
        size_t (strlcpy)(char *dst, const char *src, size_t size)
        {
          return strlcpy(dst, src, size);
        }
      
        int main(void)
        {
          char littlebuf[8];
          strlcpy(littlebuf, "Supercalifragilisticexpealidocious", sizeof littlebuf);
          return 0;
        }
      
      
        strlcpy.c: In function ‘main’:
        strlcpy.c:4:63: warning: ‘%s’ directive output truncated writing 34 bytes into a region of size 8 [-Wformat-truncation=]
         #define strlcpy(dst, src, size) ((size_t) snprintf(dst, size, "%s", src))
                                                                     ^
        strlcpy.c:14:22:
           strlcpy(littlebuf, "Supercalifragilisticexpealidocious", sizeof littlebuf);
                            ~
        strlcpy.c:14:3: note: in expansion of macro ‘strlcpy’
           strlcpy(littlebuf, "Supercalifragilisticexpealidocious", sizeof littlebuf);
         ^~~~~~~
        strlcpy.c:4:34: note: ‘snprintf’ output 35 bytes into a destination of size 8
         #define strlcpy(dst, src, size) ((size_t) snprintf(dst, size, "%s", src))
                                         ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        strlcpy.c:14:3: note: in expansion of macro ‘strlcpy’
           strlcpy(littlebuf, "Supercalifragilisticexpealidocious", sizeof littlebuf);
           ^~~~~~~
      

      If glibc doesn't do something in the header file such that we get similar diagnostics for its strlcpy, we can make the argument that this is detrimental to the program.

    • There is a hierarchy of bugs involved here. Memory safety is a much more serious class of problem. Obstinately refusing to improve the status quo because it doesn't solve all problems is just plain bad engineering. Doubly so in this case where there exist the "n" variants of string functions that are massive foot guns.

  • Yes, he’s wrong. To apply your metaphor: improvements to the mess that is string handling in C are still an improvement, even if they don’t solve the underlying problem.

  • Well, the wider problem then is using C.

    • Pretty much all operating system APIs use C-style zero-terminated strings. So while C may be historically responsible for the problem, not using C doesn't help much if you need to talk to OS APIs.

      44 replies →

    • As a user posting from a Linux machine, I disagree. Though it seems the "don't use C" crowd often delegate the important decisions to somewheres else.

      I guess the answer is "some people's C is good enough, but not yours"

      11 replies →

    • C the needle contaminated now often with deadly RCE virus. Historically it was used to inject life into the first bytes of the twisted self perpetuating bootstrapping chain of an eco system dominating today the planet and the space around it.

    • All processors are C VMs at the end of the day. They are designed for it, and it's a great language to access raw hardware and raw hardware performance.

      I still fail to label C as evil.

      P.S.: Don't start with all memory management and related stuff. We have solutions for these everywhere, incl., but not limited to GCs, Rust, etc. Their existence do not invalidate C, and we don't need to abandon it. Horses for courses.

      6 replies →

  • What a crazy metaphor! You're equating using zero terminated strings in C to doing drugs.

    • I mean it’s a wash, on the one hand zero terminated strings have done untold amounts of damage[0] and are impossible to extirpate once they’re in, on the other hand the nazis were methed (and coked later on) up their eyeballs.

      [0] and not just in C itself, unexpected truncation through FFI is an issue which regularly pops up

      1 reply →

-edit- I'm not a C programmer, nor do I have any opinion on whether api is garbage or less worse or whatever.

They seemed useful enough to get added to the other BSDs, Solaris, Mac OS X, Irix(!), QNX, and Cygwin as well as used in the Linux kernel.

Imho its pretty simple: Strings in C are 0-terminated char arrays. If the char array is not 0-terminated, its not a string.

strncpy() can make a string into a non-string (depending on size), which is clearly bad.

  • That’s because strncpy does not return a nul-terminated (“C”) strings, but a fixed-size nul-padded strings.

    That, as it turns out, is the case of most (but not all) strn* functions.

    Of course strncpy adds the injury that it’s specified to alllow nul-terminated inputs (it stops at the first nul byte, before filling the target buffer with nuls).

    • It also, in some situations, returns a *string" that doesn't have the null terminator, which means it is giving the caller something that literally isn't a string.

      1 reply →

  • > Strings in C are 0-terminated char arrays

    To be pedantic, they're pointers to char. Nothing more. Calling them array confuses non-C coders. The length is just an unenforced contract and has to be passed.

    • It's a pointer to a chunk of memory which contains an array of characters. You pass around the pointer because copying an array is expensive and wasteful.

      I think (or hope) the concepts are pretty clear if you understand what a pointer is.

  • strncpy was a bad mistake. If you know the length and there's no null termination, you use memcpy instead.

  • strncpy isn't good either. But using length delimited strings is the best way to generate fixed length char strings and NUL terminated strings.

> Correct string handling means that you always know how long your strings are

Well, I couldn't think of a stronger argument against NULL terminated strings than this. After all, NULL terminated strings make no guarantee about having a finite length. Nothing prevents you from building a memory mapped string that is being generated on demand that never ends.

  • Except that's a non-sequitur because you can totally keep separate string length fields.

    The only NUL that C requires is the NUL following C string literals, and you can even easily define char-arrays without NUL.

        char buf[5] = "Hello";
    

    or even

        #define DEFINE_NONZ_STRING(name, lit) char name[sizeof lit - 1] = lit "";
    

    Can also easily build pointer + length representations, without even a runtime strlen() cost.

        struct String { const char *buf, int len; };
        #define STRING(lit) ((String) { (lit ""), sizeof lit - 1 })

Or more likely strncpy plus forced last NUL. Return a flag on truncation unlike messing with return code or errno.

Call it safe_strncpy and be done with it. Otherwise asprintf and snprintf exist. strlcpy is a more garbage version of snprintf.

He was a jerk, but often he had a reason for his abusiveness. Was the reason in this case valid?

  • The question is: Is string truncation a good solution when the strings you have are unexpectedly long? Like, it's probably ok in a lot of cases, and once you start using these functions, it's very tempting to use them almost everywhere... but truncating "Attack at dawn on Friday" to "Attack at dawn" could be a disaster as well.

    On the other hand, his recommendation to always know string lengths and use memcpy didn't really become common practice over the last 20+ years either, so I'm not sure it was worth all the arguing.

    At this point, I'm kind of joining the camp of "C has proven to be too bug-prone for most organizations to use safely and therefore we should all go to Rust".

    • The second part "and therefore we should all go to Rust" does not follow necessarily from the first. Maybe the reason not everybody is gone to Rust is that it lacks something. Maybe we will all go somewhere else.

      1 reply →

    • I suspect the eventual end result is major compilers start implementing a "fat pointer" string ABI for internal translation units (decaying to char * at the edge where necessary) and people start turning that on.

    • > On the other hand, his recommendation to always know string lengths and use memcpy didn't really become common practice over the last 20+ years either, so I'm not sure it was worth all the arguing.

      It hasn't become common practice in C. But other languages (like JavaScript or Python) have become hugely popular, and don't use null-terminated strings.

      3 replies →

  • > but often he had a reason for his abusiveness

    There is never, ever, under any circumstances, a reason to be abusive.

  • Not really; he was frequently a jerk right out of the starting gates for no particular reason. That quote is the initial reply to the proposed patch, and the only "reason" I see for the insults is to satisfy Drepper's own emotional needs. It's petty and pathetic.

    This is very different from e.g. Torvalds who sometimes rants a bit after someone who he feels ought to know better screwed up. I'm not saying that's brilliant either, but I can be a lot more understanding when people are abrasive out of passion for their project after something went wrong.

    • Well, he does actually have a point. strlcpy is a faster (well, safer) horse than strncpy, but it's still a horse. We should not use horses as the main mode of transport anymore.

      "Doctor, it hurts when I strcpy — so don't do that".

      He's being a jerk about it, but I would not say that he doesn't have a point.

      2 replies →

  • Mostly no. True, the C NUL-terminated string definition is bad, but it's baked into the API. You need some semi-sane way to work with it that isn't 'Everyone writes their own wrappers to memccpy' (some people will get that wrong - e.g. the Linux manpage stpecpy wrapper just begs for misuse, and it's what most initiate C programmers will see if they know enough to check manpages).

    strlcpy may not be the best API, but it's sane enough and by now ubiquitous enough to deserve inclusion. Had glibc and others engaged we may have had a better API. Regardless, glibc should never have had such a long veto here.

[flagged]

  • I don't think OP intended this quote to glorify Drepper. He is correctly regarded as a giant asshole. Very smart, but also an awful person to work with.

  • Back in the 00s when Ruby was hot, the Ruby community had a remarkably constructive and helpful attitude. Even when offering criticism. Many folks attributed it to its creator with the acronym, MINASWAN ("Matz is nice and so we are nice").

    No community is perfect, but once you've seen how good it can be it's hard to have much patience for brilliant assholes.

    • I credit this more than anything for the success of Ruby. Just like I credit the 'holier than thou' attitude of the proponents of some other languages for their relative lack of success compared to where they could have been by now.

      Dutch proverb, not sure if it translates or if there is a better English version: you catch more flies with sugar than with vinegar.

      2 replies →