Comment by voidUpdate

5 hours ago

Cant you just:

  for(int i = 0; i < len(characters); i++)
  {
    if(characters[i]-48 <= 9 && characters[i]-48 >= 0)
    {
      ret = ret * 10 + characters[i] - 48;
    }
    else
    {
      return ERROR;
    }
  }
  return ret;

Adjust until it actually works, but you get the picture.

this wouldn't catch overflow or underflow errors, nor does it allow non-base-10 numbers, nor does it handle negative numbers. and writing your own parser is a failure case by op's logic. they are complaining about the builtin parsing functions.

the author admits you can parse signed integers in their second example, but for unsigned, they don't like seem to like that unsigned parsing will accept negative numbers and then automatically wrap them to their unsigned equivalents, nor do they like that C number parsing often bails with best effort on non-numeric trailing data rather than flagging it an error, nor do they like that ULONG_MAX is used as a sentinel value by sscanf.

I'm not sure what they mean by "output raw" vs "output"

    $ cat t.c
    
    #include <stdlib.h>
    #include <math.h>
    #include <stdio.h>
    
    int main(int argc, char \* argv){
    
      char * enda = NULL;
      unsigned long long a = strtoull("-18446744073709551614", &enda, 10);
      printf("in = -18446744073709551614, out = %llu\n", a);
      
      char * endb = NULL;
      unsigned long long b = strtoull("-18446744073709551615", &endb, 10);
      printf("in = -18446744073709551615, out = %llu\n", b);
      
      return 0;
    }
    $ gcc t.c
    $ ./a.out 
    in = -18446744073709551614, out = 2
    in = -18446744073709551615, out = 1
    $

I get their "output raw" value. I don't know what their "output" value is coming from.

I don't see anywhere they describe what they are representing in the raw vs not columns.

  • > they don't like seem to like that unsigned parsing will accept negative numbers and then automatically wrap them to their unsigned equivalents, nor do they like that C number parsing often bails with best effort on non-numeric trailing data rather than flagging it an error, nor do they like that ULONG_MAX is used as a sentinel value by sscanf.

    That's right. I don't like asking it to parse the number contained inside a string, and getting a different number as a result.

    That's just simply not the right answer.

    > I'm not sure what they mean by "output raw" vs "output"

    I can see how that's very unclear. Changed now to "Readable".

  • I think "output" is just supposed to be a human-readable version of "output raw". So the line in the table where "output raw" is 2 but "output" is 1 looks like a mistake. It's repeated in the table for sscanf().

And how does this avoid returning nonsense if the number is too large? (Wrapping if the accumulator is unsigned, straight to UB land if signed.) Not reporting overflows as errors is one of the major problems demonstrated by TFA.

What if the number you want to return just happens to be the value of ERROR? You need an error flag that can't be represented as an int, but then C wouldn't let you return it from a function that only returns "int". It is why some languages throw exceptions and why databases have the special "null" value.

  • I don't use C enough to know what the convention is for throwing an error when the function can return a number anyway. You'd have to ask someone else

    • In C, errors are usually indicated by a negative return value constant, crashing the program with abort, or setting the errno global (thread-local, but whatever) and expecting callers to check it. Sometimes multiple of those.

      1 reply →

  • And why some very, very special languages have an effectively-global variable called "errno" that you have to check after the call manually, and worry about whether maybe it was populated from some previous error. Nothing says "production-quality language that an entire civilization's code base should be based on" like "sometimes (but only sometimes!) functions return additional information through global values".

    • > And why some very, very special languages have an effectively-global variable called "errno" that you have to check after the call manually, and worry about whether maybe it was populated from some previous error.

      As you can read at https://en.wikipedia.org/wiki/Errno.h errno is barely used by the C standard (though defined there). It is rather POSIX that uses errno very encompassingly. For example the WinAPI functions use a much more sensible way to report errors (and don't make use of errno).

You cannot "just" anything in C without hitting a minefield of UB. It is, probably, more economical to convert your entire project to Rust than it is to do the pufferfish spine removal procedure of auditing the code base for UB and replacing the problem areas. With generative AI, the size of project for which this remains true may be as large as "the entire Linux kernel".