Strlcpy and strlcat added to glibc 2.38

3 years ago (sourceware.org)

250 comments

synergy20

As a bit of historical context:

"This is horribly inefficient BSD crap. Using these function only leads to other errors. Correct string handling means that you always know how long your strings are and therefore you can you memcpy (instead of strcpy).

Beside, those who are using strcat or variants deserved to be punished."

- Ulrich Drepper, around 23 years ago: https://sourceware.org/legacy-ml/libc-alpha/2000-08/msg00053...

sanxiyn 3 years ago
He is not wrong, is he? If you are using null terminated strings that's the thing you need to fix.
I still support this addition. If you are doing methamphetamine with needle sharing you should stop methamphetamine, but distributing clean needles is still an improvement.
- kazinator 3 years ago
  
  He's not wrong. The main reason to have these functions is that other implementations have them, and programs are using them, and having to define those functions themselves when ported to glibc.
  One benefit of defining strlcpy yourself is that you can define it as a macro that expands to an open-coded call to snprintf, and then that is diagnosed by GCC; you may get static warnings about possible truncation. (I suspect GCC might not yet be analyzing strlcpy/strlcat calls, but that could change.)
  The functions silently discard data in order to achieve memory safety. Historically, that has been viewed as acceptable in C coding culture. There are situations in which that is okay, like truncating some unimportant log message to "only" 1024 characters.
  Truncating can cause an exploitable security hole; like some syntax is truncated so that its closing brace is missing, and the attacker is able to somehow complete it maliciously.
  Even when arbitrary limits are acceptable, silently enforcing them in a low-level copying function may not be the best place in the program. If the truncation is caused by some excessively long input, maybe that input should be validated close to where it comes into the program, and rejected. E.g. don't let the user input some 500 character field, pretend you're saving it and then have them find out the next day that only 255 of it got saved.
  Even if in my program I find it useful to have a truncating copying function, I don't necessarily want it to be silent when truncation occurs. Maybe in that particular program, I want to abort the program with a diagnostic message. I can then pass large texts in the unit and integration tests, to find the places in the program that have inflexible text handling, but are being reached by unchecked large inputs.
  
  2 replies →
- woodruffw 3 years ago
  
  Yes, he’s wrong. To apply your metaphor: improvements to the mess that is string handling in C are still an improvement, even if they don’t solve the underlying problem.
- eru 3 years ago
  
  Well, the wider problem then is using C.
  
  66 replies →
- parasti 3 years ago
  
  What a crazy metaphor! You're equating using zero terminated strings in C to doing drugs.
  
  11 replies →
sillywalk 3 years ago
-edit- I'm not a C programmer, nor do I have any opinion on whether api is garbage or less worse or whatever.
They seemed useful enough to get added to the other BSDs, Solaris, Mac OS X, Irix(!), QNX, and Cygwin as well as used in the Linux kernel.
- sanxiyn 3 years ago
  
  Distributing clean needles is useful, yes, but you should still lament why it is necessary.
- saagarjha 3 years ago
  
  The Linux kernel has better options, notably strscpy.
dobin 3 years ago
Imho its pretty simple: Strings in C are 0-terminated char arrays. If the char array is not 0-terminated, its not a string.
strncpy() can make a string into a non-string (depending on size), which is clearly bad.
- masklinn 3 years ago
  
  That’s because strncpy does not return a nul-terminated (“C”) strings, but a fixed-size nul-padded strings.
  That, as it turns out, is the case of most (but not all) strn* functions.
  Of course strncpy adds the injury that it’s specified to alllow nul-terminated inputs (it stops at the first nul byte, before filling the target buffer with nuls).
  
  2 replies →
- alcover 3 years ago
  
  > Strings in C are 0-terminated char arrays
  To be pedantic, they're pointers to char. Nothing more. Calling them array confuses non-C coders. The length is just an unenforced contract and has to be passed.
  
  1 reply →
- Dwedit 3 years ago
  
  strncpy was a bad mistake. If you know the length and there's no null termination, you use memcpy instead.
- dundarious 3 years ago
  
  strncpy isn't good either. But using length delimited strings is the best way to generate fixed length char strings and NUL terminated strings.
pantalaimon 3 years ago
I'm surprised they didn't go with strscpy() directly
https://archive.kernel.org/oldlinux/htmldocs/kernel-api/API-...
- GoblinSlayer 3 years ago
  
  Because strlcpy exists in bsd since 1999: https://man.netbsd.org/strlcpy.3
a_bonobo 3 years ago

HN discussion around this quote, around 12 years ago: https://news.ycombinator.com/item?id=2378013
imtringued 3 years ago
> Correct string handling means that you always know how long your strings are
Well, I couldn't think of a stronger argument against NULL terminated strings than this. After all, NULL terminated strings make no guarantee about having a finite length. Nothing prevents you from building a memory mapped string that is being generated on demand that never ends.
- jstimpfle 3 years ago
  
  Except that's a non-sequitur because you can totally keep separate string length fields.
  The only NUL that C requires is the NUL following C string literals, and you can even easily define char-arrays without NUL.
  char buf[5] = "Hello";
  or even
  #define DEFINE_NONZ_STRING(name, lit) char name[sizeof lit - 1] = lit "";
  Can also easily build pointer + length representations, without even a runtime strlen() cost.
  struct String { const char *buf, int len; }; #define STRING(lit) ((String) { (lit ""), sizeof lit - 1 })
  
  12 replies →
AstralStorm 3 years ago

Or more likely strncpy plus forced last NUL. Return a flag on truncation unlike messing with return code or errno.
Call it safe_strncpy and be done with it. Otherwise asprintf and snprintf exist. strlcpy is a more garbage version of snprintf.
chris_wot 3 years ago
He was a jerk, but often he had a reason for his abusiveness. Was the reason in this case valid?
- IvyMike 3 years ago
  
  The question is: Is string truncation a good solution when the strings you have are unexpectedly long? Like, it's probably ok in a lot of cases, and once you start using these functions, it's very tempting to use them almost everywhere... but truncating "Attack at dawn on Friday" to "Attack at dawn" could be a disaster as well.
  On the other hand, his recommendation to always know string lengths and use memcpy didn't really become common practice over the last 20+ years either, so I'm not sure it was worth all the arguing.
  At this point, I'm kind of joining the camp of "C has proven to be too bug-prone for most organizations to use safely and therefore we should all go to Rust".
  
  7 replies →
- paulv 3 years ago
  
  > but often he had a reason for his abusiveness
  There is never, ever, under any circumstances, a reason to be abusive.
- arp242 3 years ago
  
  Not really; he was frequently a jerk right out of the starting gates for no particular reason. That quote is the initial reply to the proposed patch, and the only "reason" I see for the insults is to satisfy Drepper's own emotional needs. It's petty and pathetic.
  This is very different from e.g. Torvalds who sometimes rants a bit after someone who he feels ought to know better screwed up. I'm not saying that's brilliant either, but I can be a lot more understanding when people are abrasive out of passion for their project after something went wrong.
  
  3 replies →
- yyyk 3 years ago
  
  Mostly no. True, the C NUL-terminated string definition is bad, but it's baked into the API. You need some semi-sane way to work with it that isn't 'Everyone writes their own wrappers to memccpy' (some people will get that wrong - e.g. the Linux manpage stpecpy wrapper just begs for misuse, and it's what most initiate C programmers will see if they know enough to check manpages).
  strlcpy may not be the best API, but it's sane enough and by now ubiquitous enough to deserve inclusion. Had glibc and others engaged we may have had a better API. Regardless, glibc should never have had such a long veto here.
- bufio 3 years ago
  
  No.
- sanxiyn 3 years ago
  
  Yes.
  
  16 replies →
olgeni 3 years ago
[flagged]
- loeg 3 years ago
  
  I don't think OP intended this quote to glorify Drepper. He is correctly regarded as a giant asshole. Very smart, but also an awful person to work with.
- runlevel1 3 years ago
  
  Back in the 00s when Ruby was hot, the Ruby community had a remarkably constructive and helpful attitude. Even when offering criticism. Many folks attributed it to its creator with the acronym, MINASWAN ("Matz is nice and so we are nice").
  No community is perfect, but once you've seen how good it can be it's hard to have much patience for brilliant assholes.
  
  3 replies →
- aumerle 3 years ago
  
  [flagged]
  
  1 reply →

eatbitseveryday 3 years ago

Linux uses strscpy. See [1] [2] [3]. The issues of concern are to always NUL-terminate, and to effectively know if the result was truncated.

Truncation can lead to big issues, especially if the string being composed refers to paths, device names, other resources, etc. For example you may truncate a path from /foo/bar/baz to /foo/bar and inadvertently operate on other files. An API that makes this confusing is dangerous.

See the confused deputy problem description [4].

[1] https://mafford.com/text/the-many-ways-to-copy-a-string-in-c...

[2] https://lwn.net/Articles/659214/

[3] https://docs.kernel.org/core-api/kernel-api.html#c.strscpy

[4] https://en.wikipedia.org/wiki/Confused_deputy_proble

dchest 3 years ago
Here's discussion on why strscpy shouldn't be included in POSIX:
https://www.austingroupbugs.net/view.php?id=986 (scroll to 0002897)
- landswipe 3 years ago
  
  Boy do I miss MantisBT...
- saagarjha 3 years ago
  
  It's not a very good argument. Notably:
  > strlcpy() fits within the existing set of functions like a glove. strlcpy(a, b, n) behaves identically to snprintf(a, n, "%s", b). The return value always corresponds to the number of non-null bytes that would have been written. If we truly think that this is bad design, should we come up with a new version of snprintf() that also doesn't do this? I don't think so.
  People typically do not consider snprintf and strlcpy to be a similar family of functions. There's no need to transpose the weird behavior to a new string copying routine.
  
  6 replies →
askiiart 3 years ago

Fixed Wikipedia link: https://en.wikipedia.org/wiki/Confused_deputy_problem

hyperman1 3 years ago

I fear all of this is dancing around the nasty core of the problem: generic writing to C style strings can't be done without extra information. You can't write stuff to memory without negotiating how much room is needed and available, and optionally moving the string. Silent truncation will cause bugs. Buffer overflow even more.

Fixing this now is hard: Writing to 0-ended strings require manually tracking lengths. Expanding a string without allowing malloc is misery.

The only way out I see is basically starting from zero: ISO C should define an API with a (pointer,current length, max length) struct at its core, pointer pointing to a 0-terminated C string. You can read it, but changing it requires using functions that can error out and/or malloc more memory. There are already multiple libs like this, but C has none. If the struct would be ABI, non-C programming languages can pass strings between them.

samtho 3 years ago
C had the opportunity to include this but they did not. It is my understanding that they wanted to design everything in C as inherent to the language, rather than magic types, especially a struct. There is an elegance in the notion that a string is just an array of characters. If I’m working with a significant amount of strings in C, I can keep track of lengths, not a huge deal.
- jstimpfle 3 years ago
  
  Exactly this. There are no literals in C that create composite types. There are no composite types inherent to the language. All these types are defined in (system) includes.
  And zero-terminated strings are not strictly worse than other length-prefixed string forms. They save some space -- sure, less relevant today -- as well as provide an in-band termination signal -- which is hacky, again sure, but it is convenient when looking at a hex dump for example.
  
  11 replies →
- jacquesm 3 years ago
  
  An early version of C didn't have structs, the initial attempt to get the OS off the ground failed, and after adding structs it worked. Structs are just syntactic sugar over memory offsets relative to a base pointer, a construct for which many CPUs include primitives.
- imtringued 3 years ago
  
  C is lots of magic and quirkiness.
  This reminds me. From a spec/design perspective Ceylon was the cleanest language I know. Almost everything, including a lot of the keywords were actually defined in the standard library. The fact that Integer was actually a Java int behind the scenes was just a compiler implementation detail. There was very little "magic". If you wanted to know how something in the language worked you could just look at the standard library.
jacquesm 3 years ago

You can't even really assume that strings are writable, they might well be in ROM on an embedded device.

t43562 3 years ago

NULL terminated strings are a fact of life in many cases and C just needs lots of string functions to cater for different use cases. e.g. usually for me truncation is worse than crashing (corrupted data basically).

When I read arguments they are full of people thinking that their one size fits all and that somehow having too many variations would be bad.

This seems illogical to me since I've had to write my own string copying routines enough times because the one that fitted my need wasn't commonly available. Purism with C is just stu.... well anyhow.

A "good" fat pointer library for C would help a lot - something that could pop a NUL onto the end when you needed to put the string into an OS function but I would also have to groan at the idea that NUL termination should be outlawed in some way. At the C level you want options not limitations.

orwin 3 years ago

Yeah agree. Roll your own linked lists (or rather: avoid them but if you have to use them, roll your own, it's like 30 min to 2 hours at most), your own btrees, your own ring buffer. I guess the gnu hashmap is fine 90% of the time, but if your use case is weird, you might want to run your own too. And BTW the glibc isn't really bad, just that in C, generic solutions will miss your specifics edge cases.
And this is true for strings too. If you use only static strings in your whole project, the standard is enough, but I won't write an IRC server or a posix Shell without my own strings (not anymore :p).
GoblinSlayer 3 years ago

odbc interface can accept both null terminated and length bounded strings by using NTS sentinel value for null terminated string length.

sillywalk 3 years ago

It must be snowing in hell right now. :)

It says that they might be added to POSIX.

edit apparently Ulrich Drepper (major glibc contributer & former glibc leader is back at Red Hat [0]

https://research.redhat.com/blog/project_member/ulrich-drepp...

wahern 3 years ago

They're already in draft 3 of the forthcoming 202x revision. The glibc work resulted in a request to clarify the draft 3 specification: https://www.austingroupbugs.net/view.php?id=1726
chris_wot 3 years ago
I'll bet he would be a laugh to work with.
- sillywalk 3 years ago
  
  Perhaps working at Goldman Sachs mellowed him out some.
  
  4 replies →
- rwmj 3 years ago
  
  I work with him and he's smarter than almost everyone.
  
  2 replies →

TazeTSchnitzel 3 years ago

If you ever find yourself writing strcpy() followed by strcat(), consider:

  snprintf(buf, sizeof(buf), "%s%s", a, b);

This as safe as strlcpy and strlcat, but more efficient, and has been standard for 24 years.

saagarjha 3 years ago
1. This returns an int.
2. This is inefficient in a similar way that strlcpy is: it does more work than the size of the buffer.
- jstimpfle 3 years ago
  
  1. This is only slightly less irrelevant than "this returns a size_t" if that were so.
  2. You have the option to provide the length, snprintf(buf, sizeof buf, "%.*s%.*s", len1, str1, len2, str2);
  If you're bottlenecked by snprintf (hint: you aren't) then snprintf isn't your API anyway. Write some more custom code, probably some memcpy's etc.
  
  3 replies →
orwin 3 years ago
Honestly just use memcopy and define your own string structure.
C has void*, that allows you to implement easily modifiable data structures. There is a bit of 'NIH' syndrome in what I'm saying, I'll admit, but in the end it's better imho.
- jstimpfle 3 years ago
  
  C is all about memory management and copying data around, and people can't stop whining that there is no magic string handling sauce (as if strings were special), and keep acting like we had to put up with the ridiculous strcat() etc. nonsense.
- jacquesm 3 years ago
  
  That's not what 'void*' is for.
  
  1 reply →
loeg 3 years ago

Be careful, this does not work if buf is 'char*' rather than an array type.
rwmj 3 years ago

asprintf is even better (although different because it allocates a new string).

zbuf 3 years ago

Interesting to see the implementation is basically a two-pass process: strlen() to count the source string, followed by a memcpy().

My intuition would be to give some importance to not looping on the source string radically beyond the length of the destination buffer.

https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strl...

Thoreandan 3 years ago

25 years on. Congrats OpenBSD

classified 3 years ago

I'll never understand how Linux won over BSD.

sirwhinesalot 3 years ago

Just make your own string functions from scratch when using C, you'll thank me later.

First, you make two structs: str_buf { capacity, len, data[] } and str_view { len, data* }.

Then write your string handling functions to write into a str_buf* and read from str_views. You have the length and capacity available so memcpy is easy and safe to use.

The str_buf treats capacity as 1 less than it really is such that it can ensure there's always a stupid 0 at the end for compatibility with APIs that expect 0 terminated strings.

There you go, no more security bugs, no more nonsense.

jacquesm 3 years ago

> The str_buf treats capacity as 1 less than it really is such that it can ensure there's always a stupid 0 at the end for compatibility with APIs that expect 0 terminated strings.
Off-by-one errors are a thing.
> Just make your own string functions from scratch when using C, you'll thank me later.
No, if you're going to use C and you need a string type use a well supported string library so that you don't end up reinventing the wheel (probably in a buggy way) and benefit from the battle testing that that code has gone through.
If we're looking at actual strings (as in text) then I'd use 'libunistring'.

panzi 3 years ago

What I'd like to have is snprintf_l(). It's not standard, but it's available in FreeBSD, macOS, and Windows (as _snprintf_l()). Just not in glibc (and probably musl).

jlokier 3 years ago
snprintf_l() is not standard, but you can simulate it with uselocale() to save and restore the per-thread locale, which is standard (POSIX.1-2008) and supported by Glibc, FreeBSD, MacOS and other OSes, though Windows requires its own, different way of doing per-thread locale.
It seems it was once an explicit design choice in Glibc. Here (https://sourceware.org/bugzilla/show_bug.cgi?id=10891) a Glibc ticket refers to documentation "Thread-aware Local Model, A Proposal" of the plans for the _l() functions (https://akkadia.org/drepper/tllocale.ps.gz) long ago, which are nowadays implemented in Glibc.
It looks like snprintf_l() and other <stdio.h> functions were not part of that plan.
That plan or something like it also made its way into POSIX.1-2008, so it seems likely that the committee gave some thought to including strftime_l() and not snprintf_l().
The paper linked above includes a rationale that tiny, potentially performance-critical functions like isalpha() need a fast version that takes a local parameter, thus isalpha_l(), because of the overhead of fetching a thread-local value inside the function.
Perhaps the intent is that only those tiny functions, or even macros, whose performance would be greatly affected by the cost of fetching the thread-local locale, need a _l() version. That mostly makes sense for the functions which have _l() versions in Glibc. But with that rationale, I don't see why there is strftime_l() but not snprintf_l().
- panzi 3 years ago
  
  I want to output a lot of numbers (double) and be sure they are in "C" locale (generating JSON output). So it's actually fprintf_l. I can set/restore LC_NUMERIC in a wrapped function, but having a _l version would be nicer.

noobermin 3 years ago

The end of an era, a rather pedantic and overdrawn one.

wyldfire 3 years ago

So was Drepper wrong? Did he just get worn down? Or did it not involve him at all?

masklinn 3 years ago
I understand Drepper pretty much considers string.h a lost cause, and I can’t fault him for that.
It’s rather that POSIX decided to add the strl functions, so adding them (and verifying their semantics) is a necessity.
- silon42 3 years ago
  
  A purist would support removing strcpy, etc... but that wouldn't go well, so adding an improvement is acceptable.
  
  2 replies →
sanxiyn 3 years ago

I think the fact that it will be added to POSIX was decisive.
fs111 3 years ago

I think he is doing other things these days https://research.redhat.com/blog/project_member/ulrich-drepp...
kzrdude 3 years ago

He isn't involved anymore, is he?

alxlaz 3 years ago

There are two things I have learned from this thread:

1. glibc has been handling things with a lot more civility post-Drepper, and while it's had as hard a time as any OS infrastructure project, that's helped a lot when it came to managing more polarizing issues.

2. The greatest enemy to wider Rust adoption is not that C-educated programmers are reluctant to learn and apply more solid principles. The greatest enemy to wider Rust adoption is that, despite having ample material about the past mistakes of their forerunners, the Rust community is only learning from those that relate to language design and ignoring all the other ones.

PoignardAzur 3 years ago
> The greatest enemy to wider Rust adoption is that, despite having ample material about the past mistakes of their forerunners, the Rust community is only learning from those that relate to language design and ignoring all the other ones.
What? How is that your takeaway from the thread?
There's only a few mentions of Rust and none of them are abrasive.
- alxlaz 3 years ago
  
  There are quite a few abrasive replies upthread from some Rustaceans. They don't mention Rust by name, just like not every one of Drepper's mails contained the world "glibc", but they're in the same vein.
  This is particularly important at a point in a language's lifetime when community support is not just the best, but usually the only kind of support you can get. I like Rust and I'm very productive with it, but if anyone thinks I'm going to ask junior devs on my team to put up with the kind of stuff I see upstream, they're wrong. Just because we developed a thick skin for it on FOSS mailing lists back in the nineties doesn't mean everyone needs to.
  
  6 replies →

vogon_laureate 3 years ago

Kudos and congrats to Todd Miller and the OpenBSD folks! [1]

  [1]: https://man.openbsd.org/strlcpy.3

psychphysic 3 years ago

Urgh more str functions.

I look forwards to forgetting about this and/or discovering a new foot gun.

jstimpfle 3 years ago

Just ignore them. I suspect they're adding them (to POSIX) for portability of historic software. Here is an issue tracker I've found (Haven't read through it; not interested): https://www.austingroupbugs.net/view.php?id=986

10g1k 3 years ago

Abbreviations are just bad code. Always. Tradition is no excuse for writing bad code. Names of everything should be sufficiently descriptive that absolutely anyone will know its purpose upon first glance.

jeroenhd 3 years ago
This mere suggestion will annoy many C programmers but I completely agree with you here. With all of the ioctls, the atois and the strncpy_ls I just stopped trying to understand what the names are supposed to mean and use them for their effects only. strlcpy may as well be called qpdixhwnai for all I care, I'll have to look it up when I need it anyway.
I've learned C on Windows and the Windows API is friendly in comparison to the C API. When Windows beats you in API design, you should really reconsider some of your policies.
Is mem_copy really that much worse than memcpy? Why not memcopy? What do we gain by leaving out that single o? Why is settimeofday not sttmod if munmap is how you write memoryunmap?
It feels to me like POSIX is still being optimized for people manually poking holes into punchcards. We've had autocomplete for decades now, a few extra characters won't hurt, I promise.
- jacquesm 3 years ago
  
  Look no further than /bin to see how strong such conventions can be. Mnemonics, function names (and filenames too!) were short because memory was super expensive and likely the first resource bottleneck you'd hit while building anything significant.
  What you grow up with is what you consider to be normal and I totally get it why you'd balk at strstr or other cryptic names (or LDA or ls, for that matter) but to me they look perfectly normal and are part of my muscle memory. See also: QWERTY and the piano keyboard for mechanical analogues.
- dale_glass 3 years ago
  
  In C89, external symbols were only guaranteed to have 6 significant characters, so both "mem_copy" and "memcopy" get truncated.
  And in modern times, I suspect it'd just be thematically weird to have strcpy, strcat, and safeCopyString in string.h, so old conventions still stick around.
  
  2 replies →
- unwind 3 years ago
  
  Original C compilers only guaranteed comparing up to 6 characters in an external symbol, which I think is part of the reason why many nanes are so short.
Gibbon1 3 years ago
I'd be okay with renaming strcpy() to string_you_big_dummy_copy()
- notbeuller 3 years ago
  
  You joke, but this is almost reasonable. When refactoring a large codebase riddled with strncpy, strcpy and strcmp, understanding unambiguously what code does shouldn't come down to my middle aged eyes being able to parse better than a compiler. I did a global search and replace with a #define, verified the object code diff'd against the original version, and never looked back.
  
  1 reply →