Comment by jchw

5 years ago

> There are functions in the C library without locale-specific behaviors, though, like strchr, strpbrk, strcpy, and their wide character counterparts.

Obviously these string functions are totally fine if you use them correctly, and their implementations should be fairly optimal. However, I do think they put a lot of onus on the programmer to be very careful.

For example, using strncpy to avoid buffer overflows is an obvious trap, since it doesn’t terminate a string if it overflows... strlcpy exists to help deal with the null termination issue, but it still has the failure mode of truncating on overflow, which can obviously lead to security issues if one is not careful. strcpy_s exists in C11 and Microsoft CRT, though I believe Microsoft’s “secure” functions work differently from C11’s. These functions are a bit better because they fail explicitly on truncation and clobber the destination.

OpenBSD arguably has some of the best security track record of all C projects and I still feel weary about their preferred mechanism for string copying and concatenation (strlcpy and strlcat.) I feel strlcpy and strlcat are both prone to errors if the programmer is not careful to avoid security and correctness issues caused by truncation, and strlcat has efficiency issues in many non-trivial use cases.

While there are obvious cases where dynamically allocated strings of arbitrary length, such as those seen in C++, Rust, Go, etc. can lead to security issues, especially DoS issues, I still feel they are a good foundation to build on because they are less prone to correctness issues that can lead to more serious problems. Whether you are in C or not, you will always need to set limits on inputs to avoid DoS issues (even unintentional ones) so I feel less concerned about the problems that come with strings that grow dynamically.

One of the biggest complaints about prefix-sized strings/Pascal style strings is that you can’t point into the string to get a suffix of the original string. However, in modern programming languages this is alleviated by making not only dynamic strings a primitive, but also string slices. (Even modern C++, with its string_view class.) String slices are even more powerful, since they can specify any range in a string, not just suffixes.

So really locale and strtod are just little microcosms of why I am weary of C string handling. Clearly you can write code using C string functions that is efficient, secure and correct. However, I feel like there are plenty of pitfalls for all three that even experienced programmers have trouble avoiding sometimes. I don’t actually know of a case where locale can break strtol, but it doesn’t matter too much, since anyone can write a decent strtol implementation (as long as they test the edge cases carefully...) strtod though, is not so easy, and I guess that means apps are best off avoiding locales other than C. In a library though, there’s not much you can do about it. At least not without causing thread safety issues :) In other languages, aside from dynamic strings and string slices, locale independent string functions is also typically the default. Rust’s f64::from_str, Go’s strconv.ParseFloat, C++’s std::from_chars and so forth. It’s not too surprising since a lot of the decisions made in these languages were specifically made from trying to improve on C pitfalls. I do wish C itself would also consider at least adding some locale independent string functions for things like strtod in a future standard...

0 comments

jchw

No comments yet

Contribute on Hacker News ↗