← Back to context

Comment by _kst_

4 years ago

There's a bigger potential problem with the *scanf() functions than performance. They are inherently unsafe for reading numeric input.

For example, if you do something like this:

    int n;
    sscanf("9999999999999999999999999", "%d", &n);

the behavior is undefined. As the C standard says:

> ... the result of the conversion is placed in the object pointed to by the first argument following the format argument that has not already received a conversion result. If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined.

You can control the appropriate type by writing the call directly, but you can't guarantee that the result can be represented unless you have control over the input.

Remember that in C "undefined behavior" doesn't mean that your program will fail, or will crash, or will tell you there was a problem. It means that a conforming implementation can do literally anything. In the worst case, it will do what what you expect until it fails at the most inconvenient possible moment.

Now most implementations will probably do something sane, like setting a floating-point object to infinity or an integer object to some arbitrary value, but the language doesn't guarantee anything.

If you want to write safe code, you can extract a substring that represents a number and pass it to one of the strto*() functions, which do have well defined behavior on overflow. (But I couldn't tell you exactly what that behavior is without looking it up.)

> Remember that in C "undefined behavior" [...] means that a conforming implementation can do literally anything. In the worst case, it will do what what you expect until it fails at the most inconvenient possible moment.

Actually, the worst case possibility is that your program will become Skynet, enslave humanity for 10000 years, collapse all stars in the universe into black holes, and significantly accelerate processes such as the heat death of the universe.

  • It's even allowed for the program to change all extant texts of the Standard to indicate that its behavior is well-defined.

You’re misreading the standard a bit I think. It’s saying undefined behavior comes from the format string (which you should control and is a common compiler warning if it’s not a literal) doesn’t match the types of variables you pass it. This is kind of obvious when you think about it. Variadic C functions lose type information so the format string is the source of that.

The “out-of-range” issue just means that the library isn’t going to mandate every implementation of this function is guaranteeing to provide the same overflow behavior (some might stop when you saturate, others might stop at the end of digits input and overflow, others might detect the overflow and saturate).

The Linux man page is clearer here IMO:

> If the number of conversion specifications in format exceeds the number of pointer arguments, the results are undefined. If the number of pointer arguments exceeds the number of conversion specifications, then the excess pointer arguments are evaluated, but are otherwise ignored.

That’s the only spot the word “undefined” appears and doesn’t discuss overflow. My general impression is that the “undefined” problem largely only applies to language operations or user input causing a library to perform such an undefined behavior. Older C functions with older documentation may be using “undefined” with a less strict meaning to also cover “implementation-defined”. The “undefined behavior” brouhaha came up in the past 5-10 years only when compilers actually started leveraging it breaking a lot of assumptions.

Wow, the problems are so deep that no human should ever use sscanf again. It's just as bad as gets() because there is no way for the programmer to deal with the error case.

Obligatory: it is flabbergasting that this is a quality of language that is still in active use.

  • Most of these things can't be fixed at the language level without seriously breakages or preventing users from doing necessarily dangerous things.