← Back to context

Comment by wnoise

5 years ago

Wow. Thanks for looking.

> limited substrings of the input string as it refills its input buffer,

As far as I can tell, that copying helper function set to the read member of the FILE* never actually gets called in this path. I see no references to f->read() or anything that would call it. All of the access goes through shgetc and shunget, shlim, and shcnt, which directly reference the buf, with no copying. The called functions __intscan() and __floatscan() do the same. __toread() is called but just ensures it is readable, and possibly resets some pointers.

Even if it did, that pretty much does make it entirely free of this behavior, though not of added overhead. That operations structure stuffed into the file buffer doesn't scan the entire string, only copying an at most fixed amount more than asked for (stopping if the string terminates earlier than that). That leaves it linear, just with some unfortunate overhead.

I do find the exceedingly common choice of funneling all the scanf variants through fscanf to be weird. But I guess if they already have one structure for indirecting input, it's easy to overload that. (And somehow _not_ have a general "string as a FILE" facility, and building on top of that. (Posix 2008 does have fmemopen(), but it's unsuitable, as it is buffer with specified size (which would need to be calculated, as in the MS case), rather than not worried about until a NUL byte is reached.))

You've missed what happens in __uflow() when __toread() does not return EOF. (And yes, that does mean occasional memchr() of single characters and repeated memchr()s of the same memory block.)

> Posix 2008 does have fmemopen(), but it's unsuitable, as it is buffer with specified size (which would need to be calculated, as in the MS case), rather than not worried about until a NUL byte is reached.

With fmemopen(), you only need to calculate the length once at the start, right? And then you can use the stream instead.

  • Yes, you can do that. But libc can't use that as an implementation strategy without also having this linear-turned-quadratic behavior.