Comment by formerly_proven

1 day ago

strncpy is fairly easy, that's a special-purpose function for copying a C string into a fixed-width string, like typically used in old C applications for on-disk formats. E.g. you might have a char username[20] field which can contain up to 20 characters, with unused characters filled with NULs. That's what strncpy is for. The destination argument should always be a fixed-size char array.

A couple years ago we got a new manual page courtesy of Alejandro Colomar just about this: https://man.archlinux.org/man/string_copying.7.en

strncpy doesn’t handle overlapping buffers (undefined behavior). Better to use strncpy_s (if you can) as it is safer overall. See: https://en.cppreference.com/w/c/string/byte/strncpy.html.

As an aside, this is part of the reason why there are so many C successor languages: you can end up with undefined behavior if you don’t always carefully read the docs.

  • > strncpy doesn’t handle overlapping buffers (undefined behavior).

    It would make little sense for strncpy to handle this case, since, as I pointed out above, it converts between different kinds of strings.

  • Back when strncpy was written there was no undefined behaviour (as the compiler interprets it today). The result would depend on the implementation and might differ between invocations, but it was never the "this will not happen" footgun of today. The modern interpretation of undefined behaviour in C is a big blemish on the otherwise excellent standards committee, committed (hah) in the name of extremely dubious performance claims. If "undefined" meaning "left to the implementation" was good enough when CPU frequency was measured in MHz and nobody had more than one, surely it is good enough today too.

    Also I'm not sure what you mean with C successor languages not having undefined behaviour, as both Rust and Zig inherit it wholesale from LLVM. At least last I checked that was the case, correct me if I am wrong. Go, Java and C# all have sane behaviour, but those are much higher level.

    • Rust safe subset doesn't have UB. At all. So long as you never write the "unsafe" keyword you're fine, the compiler will check you are obeying all of the language rules at all times.

      Whereas in C, oops, sorry, you broke a rule you didn't even know existed and so that's Undefined Behaviour left and right. Some of it you could argue falls into the category you're describing, where in a better world it should have been made Implementation Defined, not UB, and too bad. However lots of it is just because the language was designed a very long time ago and prioritized ease of implementation.

      If you wish the language was properly defined, you should use (safe) Rust. If you just wish that when you write nonsense the compiler should somehow guess what you meant and do that, you're not actually a programmer, find a practice which suits you better - take up knitting, learn to paint, something like that.

    • The problem isn't undefined behavior per se; I was using it as an example for strncpy. Rust is a no - in fact, the goal of (safe) Rust is to eliminate undefined behavior. Zig on the other hand I don't know about.

      In general, I see two issues at play here:

      1. C relies heavily on unsized pointers (vs. fat pointers), which is why strncpy_s had to "break" strncpy in order to improve bounds checks.

      2. strncpy memory aliasing restrictions are not encoded in the API and can only be conveyed through docs. This is a footgun.

      For (1), Rust APIs of this type operate on sized slices, or in the case of strings, string slices. Zig defines strings as sized byte slices.

      For (2), Rust enforces this invariant via the borrow checker by disallowing (at compile-time) a shared slice reference that points to an overlapping mutable slice reference. In other words, an API like this is simply not possible to define in (safe) Rust, which means you (as the user) do not need to pore over the docs for each stdlib function you use looking for memory-related footguns.

      7 replies →

Yes, these were also common in several wire formats I had to use for market data/entry.

You would think char symbol[20] would be inefficient for such performance sensitive software, but for the vast majority of exchanges, their technical competencies were not there to properly replace these readable symbol/IDs with a compact/opaque integer ID like a u32. Several exchanges tried and they had numerous issues with IDs not being "properly" unique across symbol types, or time (restarts intra-day or shortly before the open were a common nightmare), etc. A char symbol[20] and strncpy was a dream by comparison.

A big footgun with strncpy is that the output string may not be null terminated.

  • Yeah but fixed width strings don’t need null termination. You know exactly how long the string is. No need to find that null byte.

    • Until you pass them as a `char *` by accident and it eventually makes its way to some code that does expect null termination.

      There’s languages where you can be quite confident your string will never need null termination… but C is not one of them.

      6 replies →

Isn't strlcpy the safer solution these days?

  • I don't think anybody in this thread read the article.

    Strlcpy tries to improve the situation but still has problems. As the article points out it is almost never desirable to truncate a string passed into strXcpy, yet that is what all of those functions do. Even worse, they attempt to run to the end of the string regardless of the size parameter so they don't even necessarily save you from the unterminated string case. They also do loads of unnecessary work, especially if your source string is very long (like a mmaped text file).

    Strncpy got this behavior because it was trying to implement the dubious truncation feature and needed to tell the programmer where their data was truncated. Strlcpy adopted the same behavior because it was trying to be a drop in replacement. But it was a dumb idea from the start and it causes a lot of pain unnecessarily.

    The crazy thing is that strcpy has the best interface, but of course it's only useful in cases where you have externally verified that the copy is safe before you call it, and as the article points out if you know this then you can just use memcpy instead.

    As you ponder the situation you inevitably come to the conclusion that it would have been better if strings brought along their own length parameter instead of relying on a terminator, but then you realize that in order to support editing of the string as well as passing substrings you'll need to have some struct that has the base pointer, length, and possibly a substring offset and length and you've just re-invented slices. It's also clear why a system like this was not invented for the original C that was developed on PDP machines with just a few hundred KB of RAM.

    Is it really too late for the C committee to not develop a modern string library that ships with base C26 or C27? I get that they really hate adding features, but C strings have been a problem for over 50 years now, and I'm not advocating for the old strings to be removed or even deprecated at this time. Just that a modern replacement be available and to encourage people to use them for new code.

    • > Is it really too late for the C committee to not develop a modern string library that ships with base C26 or C27? I get that they really hate adding features, but C strings have been a problem for over 50 years now, and I'm not advocating for the old strings to be removed or even deprecated at this time. Just that a modern replacement be available and to encourage people to use them for new code.

      The next version of C (C2y) is expected to be C29, not C26 or C27. And work has been done on a new string library: see, e.g. https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3306.pdf (not the only proposal!). That said, I would be surprised if anything gets merged into the standard in less than a decade, simply because the committee is not organizationally set up for major library overhauls like this.