← Back to context

Comment by moonchild

4 years ago

(Originally on lobsters[0].)

I maintain my original position that sscanf calculating the entire length of its input is absolutely ridiculous. Are *scanf difficult to use safely, not very robust, and somewhat baroque? Yes. Should sscanf("%f") be a correct (not performance-killing) way of reading floats? Also yes. (Though aside: the OP seems to be reading data from files, so they could have just used fscanf, which has correct performance already.)

Unfortunately, many libcs are guilty of this:

- glibc uses memchr (the trail is convoluted, but ends up at _IO_str_init_static_internal)

- freebsd libc (and thus also the apple and android libcs, as well as those of the other BSDs) use strlen

- uclibc and newlib are the same as freebsd (appear to be copied directly from it)

- Since the original bug was in GTA, which only runs on windows, I must presume msvcrt has the same problem

- musl has the correct behaviour, processing input in 128-byte chunks

- managarm doesn’t strlen but looks broken for unrelated reasons. (Assumes nul byte means eof.) Also has codebloat because of templates.

- serenityos tries to implement fscanf in terms of sscanf, not the other way around! Unfortunately that means it chomps a whole line of input at every call, so it doesn’t even work correctly. Horrifying.

- pdclib has ok performance, but with an interesting implementation: it duplicates code between sscanf and fscanf, though the heavyweight format parsing is shared.

- dietlibc and sortix have the sensible, simple implementation

0. https://lobste.rs/s/0obriy/it_can_happen_you#c_giuxfq

Agreed... I don't see why people are arguing this problematic sscanf behaviour should be seen as an intentional deficiency and not a bug.

The argument seems to be, "what were you expecting using C stdlib functions for that?" Well, of course they will suck forever with that mentality.

Reading this article was a surprise for me, I didn't know of this issue at all.

But this is pretty ridiculous. If it's possible to write scanf, which matches chars from a stream, why can't sscanf just do the exact same thing but check for '\0' rather than EOF...

  • It can, and the people who only check a few well-known open source C library implementations miss that there is quite a range of other C library implementations out there that do this very thing, from P.J. Plauger's through OpenWatcom's and Tru64 Unix's to mine. (-:

    * https://news.ycombinator.com/item?id=26300532

    • I don't know what you mean by that. I pointed out two libcs that do exactly that (that was what I meant by ‘the sensible, simple implementation’; perhaps that wasn't clear enough?) as well as multiple other approaches that also result in correct performance. And the managarm and sortix libcs (for instance) are hardly well known.

Exactly. If the standard library's sorting function executed in O(n^5), I would consider that a problem with the standard library.