Comment by formerly_proven

4 years ago

I don't know if it is required to, but there doesn't really seem to be an upper bound to what glibc's scanf will eat for a %f (e.g. a gigabyte of zeroes followed by "1.5" will still be parsed as 1.5), so for that implementation there certainly isn't a trivial upper bound for the amount of input read and processed that is done for %f, like you would perhaps expect.

Yet another reason to not stringify floats. Just use hexfloats (but beware of C++ standard bugs involving >>) or binary.

Unfortunately "gigabytes of numerical data, but formatted as a text file" is commonplace. For some reason HDF5 is far less popular than it ought to be.

2 comments

formerly_proven

dominicl 4 years ago

But why do strlen() at all? And why are all platforms (Linux, Windows, MacOS) seemingly doing that?

I think you're right that there is no upper bound but it shouldn't be necessary to do a full strlen() if you're instead scanning incremental. You could go char by char until the pattern '%f' is fullfilled and then return. That would solve the issue on it's root -- and who know how many programs would suddenly get faster...

So looking at glibc souces I've found the culprit in abstraction. Looks like a FILE* like stringbuffer object is created around the c-string: https://sourceware.org/git/?p=glibc.git;a=blob;f=stdio-commo...

And that abstraction does call something similiar to strlen() when initializing to know it's bounds here: https://sourceware.org/git/?p=glibc.git;a=blob;f=libio/strop...

I'm reading this source the first time, but I guess to not break anything one could introduce a new type of FILE* stringbuffer let's say in 'strops_incr.c' that is working incrementally reading one char at the time from the underlying string skipping the strlen()...

Would be awesome cool if GTA online would be loading faster under wine than on windows :-)

JdeBP 4 years ago

See https://news.ycombinator.com/item?id=26298300 . The alternative implementation technique that already exists in some C implementations is not to use nonce FILE objects at all.