It Can Happen to You

4 years ago (mattkeeter.com)

Loving the progression here. Tomorrow, someone’s going to reduce the boot times of macOS by 90% by the same principle. A week from now, someone will prove P=NP because all the problems we thought were NP were just running strlen() on the whole input.

  • And maybe, in a decade or so, the man page for these functions will list their algorithmic complexity! That was the most interesting takeaway from this article, for me at least. I have only seen a one or two libraries that actually list this in their documentation.

    • The cppreference page linked by the blog post has been changed since: https://en.cppreference.com/w/cpp/io/c/fscanf#Notes

      > Note that some implementations of sscanf involve a call to strlen, which makes their runtime linear on the length of the entire string. This means that if sscanf is called in a loop to repeatedly parse values from the front of a string, your code might run in quadratic time

      1 reply →

    • I don't know if it is required to, but there doesn't really seem to be an upper bound to what glibc's scanf will eat for a %f (e.g. a gigabyte of zeroes followed by "1.5" will still be parsed as 1.5), so for that implementation there certainly isn't a trivial upper bound for the amount of input read and processed that is done for %f, like you would perhaps expect.

      Yet another reason to not stringify floats. Just use hexfloats (but beware of C++ standard bugs involving >>) or binary.

      Unfortunately "gigabytes of numerical data, but formatted as a text file" is commonplace. For some reason HDF5 is far less popular than it ought to be.

      2 replies →

    • How would it help you knowing that it's O(n)? It needs to read all the characters of the float. Problem is that it's needlessly reading characters even after the float

  • You're joking, but now I'm thinking about the XML we parse at work and the library we're using to do it. We parse a lot of it, but I've always had this vague feeling that it takes a bit too long (given the codebase is C++).

    The XML library we use is rather well-known, so if someone found a bug like this there, I'd suspect a general improvement of performance across the board in the entire industry. Efficient Market Hypothesis tells me it's unlikely the library has this problem, but then again, so I thought about AAA videogames, and then GTA Online thing came out.

    • > it's unlikely the library has this problem

      Any sufficiently-complex library code likely has plenty of problems, often unavoidably so (e.g. trade-offs between best performance and edge cases). Whether they have been found or not is a function of many, many factors.

      > Efficient Market Hypothesis

      I've lived long enough to be very sceptical about that sort of thing. Markets tend to be efficient in aggregate, maybe, but on the single case they can fail quite brutally. Look at how "dramatic" bugs are overlooked even in critical pieces of infrastructure like openssl, for years and years; maybe it happens less for openssl than most equivalent libraries, but it still happens.

      Also, once the "market" for standards moves on, network effects make it very hard to have any meaningful competition. I mean, who writes XML parsers nowadays? Whichever XML lib was winning when JSON "happened" is now likely to stay in control of that particular segment; and the likelihood that top developers will keep reviewing it, falls off a cliff. Sprinkle a bit of cargo-cultism on top, and "efficient markets" become almost a cruel joke.

      2 replies →

    • It's possible. I've personally reduced the time spent for reading huge XML file on the startup of an application at least 10 times in the application I was in charge of, by avoiding the library dependence and writing a custom code. Having a lot of experience in such kinds of code and in the performance issues, it was quite a fast change with no negative effects.

      The prehistory of that was simple: up to some point the amount of data stored was reasonably small. Then from some point on the amount of data grew significantly (a few orders of magnitude), and the startup times became very unpleasant.

      There's a lot that is going on when loading huge XML files. As an example, don't forget all the possible Unicode conversions, all the possible allocations of the elements in the handling code, just to be discarded etc.

      I don't suggest everybody doing it "just because" but if some specific use is known to have very specific assumptions and it is in the "hot path" and really dominates (profile first!) and it is known that only a small subset of all XML possibilities will ever be used it can be justified to avoid the heavy libraries. For example, in that specific case, I knew that the XML is practically always only read and written by the application, or by somebody who knew what he was doing, and not something that somebody random in some random form would regularly provide from the outside, and I knew that my change surely won't break anything for years to come, as I knew for sure that that part of application was not the "hot spot" of expected future changes.

      So it was a win-win. Immensely faster application startup, which is something that improved everybody's work, while preserving the "readability" of that file for the infrequent manual editing or control (and easy diff).

      1 reply →

    • If you have a lot of nesting in the XML, and it is formatted for human reading (i.e. indented), you may want to consider not doing that. We had a project where we were creating human-readable versions of the XML (mostly for developer convenience) and then parsing it. When we stopped adding all the extra white space the parsing speed increased a couple of orders of magnitude. (The downside was we no longer had built in coffee breaks in our development process.)

      1 reply →

    • What about parsing that XML upfront, serialising to some binary format (e.g. CBOR, maybe with nlohmann's JSON library, or Cap'n Proto) and shipping the binary file?

      1 reply →

  • That's actually a very simple one. Just run a regex on "P != NP" to remove the "!" and you're good to go.

    • Seriously the most I have laughed in like 6 months. Which probably says a lot more about me than this joke. I know that jokes aren't really welcome on HN, and I generally really like this policy. But just had to mention this was just ... what I needed to read right now.

      2 replies →

  • Well, https://twitter.com/stephentyrone/status/1366573121365565444

    >> So many years ago when I first took over the iOS string functions, I found that like 30% of boot time in debug builds was in strstr. <<

    >> Needless to say, we did not fix that issue by writing a more efficient strstr. Removed the parser and then removed strstr from the environment where it had been in use =) <<

  • You kid. But truer things are said in jest.

    > ...Tomorrow, someone’s going to reduce the boot times of macOS by 90% by the same principle.

    My 2019 MacBook often pauses when I connect the charging cable. Sometimes it just seizes, requiring a hard bounce.

    Clearly there's a contended lock buried deep. Something non-obvious.

    I'm certain everything these days has dozens of hidden quadratics and contended locks.

    Which is one of the reasons I'm excited for stuff like structured concurrency (Java's Project Loom) and retoolings like systemd becoming the norm.

    Ages ago I worked on kitchen sink app that had a very finicky startup. Any breeze would break it. Much consternation by mgmt. Apparently if we only clapped louder, Tinkerbell would fly. I couldn't take it any more. LARPing as a bulldozer, I replumbed the innards, changing from something like initd to be more like systemd with some lazy loading for good measure. Voila!

    Back to GTA. The failure here is the product owner didn't specify a max load time, and then hold the team to it. Devs will mostly do the work that's expected of them. If load time wasn't measured (and managed), no one is going to bother with expunging sscanfs.

    • > My 2019 MacBook often pauses when I connect the charging cable. Sometimes it just seizes, requiring a hard bounce.

      Yesterday my MBP kernel panicked because my keyboard was low on battery and the bluetooth connection kept dropping. There's something weird with MacOS where peripherals seem to really not be well isolated from the core OS runtime.

      1 reply →

  • You joke, but there's actually lots of work going on into what techniques will definitely NOT be enough to settle P=NP.

    (I find it pretty exciting, that this kind of negative result is possible. Ain't mathematics wonderful?)

  • > A week from now, someone will prove P=NP because all the problems we thought were NP were just running strlen() on the whole input.

    You owe me a keyboard!

  • Next Hacktober should offer a free t-shirt to anyone who goes out and issues a PR to a repo containing sscanf

Blog author here! Thanks to HN for warning me about sscanf at exactly the right time – within a day of me trying to load some ASCII STLs and noticing it was slow...

Linked deep in the Twitter replies [1], there's an open glibc issue about this, dating back to 2014:

https://sourceware.org/bugzilla/show_bug.cgi?id=17577

C doesn't have any requirements on the complexity of sscanf, so it might not be a bug per se, but it's certainly... a pitfall.

[1] https://twitter.com/impraxical/status/1367194430835425283

  • Hey, Matt, neat to see you here and congrats on making the front page! Recognize you from the Formlabs forums & conferences.

    Love that notion of professional empathy underscoring your message in the blog post.

    • Hey RK! Thanks for the kind words – hopefully I'll see you again, once we're back to holding in-person events!

    • I know Matt too, primarily from him rejecting my Libfive PRs for being "too 1337".

      But seriously, Matt, I might have a project for an in-person event regarding printing robots. Stay tuned--to what channel I don't know.

  • IMO the lack of a complexity requirement is a bug in the C standard. And really it’s a bug in the implementation(s?) too. If it can be done on O(1), shame on library authors for doing it in O(n). If you want programmers to trust library authors, don’t do this to us. Maybe std::from_chars FTW?

    • This is not a complexity issue with the function. The function is linear to the input, as it should be. The problem is that the implementation does more work then it needs to (it doesn't need the length of the string). It should be linear to the end of parsing not the end of string. The complexity in this case comes from the loops calling it.

    • Shouldn’t we just come clean and admit to ourselves that there is no such thing as the C standard? There is a collection of loosely related languages that look similar and that collectively we call C, but really they’re all completely different and share almost no interoperability or common characteristics. And those standards that do exist provide almost no ability to reason about your code including things like ordering of statements.

      21 replies →

  • Thank you for introducing me to the concept/term Ascetic programming. Not sure how widely used it is, but I find it more fitting for what I try to do than minimalistic or KISS.

    Also, it is great to see someone write

      > I noticed that ASCII STL loading was really quite slow.
      > From startup to showing the window, it took over 1.8 seconds!
    

    I always find pleasure seeing projects which highlight just how fast modern computers really are.

    • Re-read my comment and to be clear, the quote and the last paragraph are not related. The last sentence was meant to refer to the Erizo project as a nice single purpose high performing tool, not as a comment to the bug that made it slow.

  • printf and scanf match nicely with their format specifiers, so the serialization and deserialization can be maintained nicely in lockstep.

    to avoid the quadratic overheating sttlen you can simply use fmemopen(3), which makes the temporary sscanf FILE object explicit and persistent for the whole parse, and needs just one strlen call.

    https://news.ycombinator.com/item?id=26343149

I think the really embarrassing part for Rockstar is that they didn't bother to investigate what took 5+ minutes to load in their star product, a simple profiling would've made the issue obvious. So either they knew and they didn't care, or they didn't know and they didn't care.

That being said both for GTA and for TFA the issue is a very similar sscanf call:

     sscanf(data, "%f", &f);

I already posted a similar comment in the GTA story but I really want to emphasize it: scanf is almost never the right tool for the job, and it's definitely not the right tool in this situation. Just use strtof. That's literally what it's for. String to float. There. Done.

Scanf is crappy and if it were up to me would've been deprecated a while ago. I can sort of see using it for a quick one-off "script", for instance to parse user input, but seeing it in the middle of a program will always raise a huge red flag for me.

Use strtok_r if you need to split a string, then parse every entry individually. It's more robust, more flexible (you can parse custom types and formats that way) and allows for much better error handling and diagnostics. And of course it's also usually vastly faster.

Scanf is an antipattern in my opinion. I literally never use it and I'm better off for it. The last time I interviewed for a C coder position I managed to answer the full C test quizz except for the one question regarding scanf. That's how much I don't use it.

I think it's even worse for developers who come from higher level languages and (reasonably) expect to be able to deserialize data easily. You simply can't do that in C, the type system and general philosophy of the language won't let you, but scanf may convey the illusion that it's sort of possible. Don't believe its lies.

  • I was thinking about it the other day when reading the original article, and this was the only plausible (and defensible) cause for it not being addressed:

    When GTA online was released 7 years ago in 2013, the list of DLC items was probably much shorter, and grew over time. The performance issue is exponentially aggravated with list-length. The list growth was probably bell-curve shaped over the lifetime of the game.

    This has an interesting dynamic when it comes to perceived performance:

    In the beginning, on consoles and PCs - it was already a pretty long load time, but would have been 90s or so on an average gaming PC (I remember this from the early days playing it, on a modest gaming PC with an FX-8150 cpu). This is long, but tolerable for a game of this size. I'm certain that early complaints that it was sluggish to load were profiled and looked at, and at the time it wasn't a 4 minute ordeal to load the json and probably represented a fraction of the CPU time it takes today - not standing out as obviously as in OPs guerilla profiling. Devs put a pin in it and say "this is netcode related, it is what it is"

    Over time, the list gets longer, the loading time takes more cycles, BUT, PCs are getting progressively faster year over year as well, with many of those improvements happening at the instruction-level - optimizing for things like, surprise, string scanning. Two console generations are released since, masking the problem on that side. For comparison sake, I just checked and I can load GTA online in about 75s on my Ryzen 3900x. This cpu is probably 4-6x faster in single core performance than the 8150 for most workloads. Again, it's slow but tolerable and by this time it's "yeah GTA online is just a big game and takes a while to load, it's always been that way". Complacency is the enemy of improvement, and things that regress slowly over time are hard for us to notice in general.

    Don't take this as a "this is fine" comment, but instead the only reasonable justification I can think of as to why it might have flown under the radar all these years.

  • I think 'embarrassing' is too strong a word. AAA game development is rushed; the pressure is to ship. Something has to give. This is a user facing issue, but one that doesn't actually affect the gameplay. Assuming they had -time- to profile the load process, given that low a priority, seems extremely optimistic.

    • >AAA game development is rushed; the pressure is to ship.

      I'd be more understanding if GTA Online hadn't already shipped its first version in October of 2013. Surely there would've been some time after shipping the first version to profile the game.

      5 replies →

    • Well, it doesn't affect the gameplay if the player starts the game once and never closes it. But for anybody who wants to hop on for a quick bit of fun, it's a notable barrier. There are definitely games I've stopped playing because it takes too much time to launch the thing.

    • I wouldn't have said anything if the game was released one month ago, but GTA V is almost 8 year old now and it's been ported to several generations of hardware (IIRC they've even announced "next gen" ports to release this year). The online function is still maintained and makes them a lot of money. I also do think that it affects the gameplay because these loading times are genuinely terrible. A 30second loading screen is a nuisance, a 5+ minute loading screen just makes me want not to play the game.

      I think that Rockstar deserves some blame here, especially since this problem might well be a consequence of their notoriously bad development practices.

    • I tend to agree. When you are rushing things get missed. Also if it was a problem from the beginning you just might not think its an issue (its just how long it takes) .

      One philosophy I heard in my days of programming (not sure how I remembered this but its still out there) :

      Make it work, make it right, make it fast. -- Kent Beck

    • embarrassing is too light a word.

      Rockstar has virtually endless resources and the game has been out for many years. for years, they didn't reduce the extremely long load times? not only embarrassing, but shows deep incompetence and lack of respect for the craft and for end users.

  • scanf and printf have complementary format specifiers, which can make maintaining serialization and parsing of regular data a breeze...

    the proper remedy is to simply wrap the string to parse with fmemopen(3), which makes the temporary FILE object explicit and persistent for the whole parse, and needs just one strlen call.

    https://news.ycombinator.com/item?id=26343149

    • Cool trick, thanks for sharing. I don't get why there isn't a suitable snscanf function that takes the buffer length as an argument and returns the number of bytes parsed?

      1 reply →

  • Not excusing this but there are likely a few mitigating factors here.

    * Tight deadlines result in shipping code that's barely tested and may have resulted in minimal code reviews on it. * The original article mentioned how the ids were always unique. It may have been intended to load content from multiple sources or to allow patching of content on disk (or repurposed entirely from a different game). Or it could well be an oversight/over-engineering. * It may even be a general purpose json parser from another project that had never been tested with data of this size until after launch. * It probably wasn't always this bad. Likely when the game launched the loading times were much more reasonable as the amount of in-app-purchases was an order of magnitude smaller.

    Typically most of the IAPs will be added much later, so much of the profiling work would have been done with this code having a much smaller json block.

    When the game was shipped the dev team will likely have been shrunk significantly as the bulk of the team moves to a new project leaving a smaller team with a focus more on the content itself and the engine team that likely deal with and spot stuff like this will probably have their attention elsewhere.

    Don't work for R*, have shipped many high budget titles though including live services.

  • Agreed. My first instinct was the same: *scanf is never the right tool for pretty much any job.

    I learned this 20+ years ago. As far as I'm concerned it should have been considered deprecated along with gets; it was considered dangerous in the early 90s and probably before. Not sure why people are still using it in the 2000s+.

  • The Go standard library is pretty good, but unfortunately, it includes a scanf clone, so every once in a while you see a poor new developer posting to help forums trying to get it to work properly and you have to break it to them that they're using the wrong tool for basically any job.

There's a bigger potential problem with the *scanf() functions than performance. They are inherently unsafe for reading numeric input.

For example, if you do something like this:

    int n;
    sscanf("9999999999999999999999999", "%d", &n);

the behavior is undefined. As the C standard says:

> ... the result of the conversion is placed in the object pointed to by the first argument following the format argument that has not already received a conversion result. If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined.

You can control the appropriate type by writing the call directly, but you can't guarantee that the result can be represented unless you have control over the input.

Remember that in C "undefined behavior" doesn't mean that your program will fail, or will crash, or will tell you there was a problem. It means that a conforming implementation can do literally anything. In the worst case, it will do what what you expect until it fails at the most inconvenient possible moment.

Now most implementations will probably do something sane, like setting a floating-point object to infinity or an integer object to some arbitrary value, but the language doesn't guarantee anything.

If you want to write safe code, you can extract a substring that represents a number and pass it to one of the strto*() functions, which do have well defined behavior on overflow. (But I couldn't tell you exactly what that behavior is without looking it up.)

  • > Remember that in C "undefined behavior" [...] means that a conforming implementation can do literally anything. In the worst case, it will do what what you expect until it fails at the most inconvenient possible moment.

    Actually, the worst case possibility is that your program will become Skynet, enslave humanity for 10000 years, collapse all stars in the universe into black holes, and significantly accelerate processes such as the heat death of the universe.

    • It's even allowed for the program to change all extant texts of the Standard to indicate that its behavior is well-defined.

  • You’re misreading the standard a bit I think. It’s saying undefined behavior comes from the format string (which you should control and is a common compiler warning if it’s not a literal) doesn’t match the types of variables you pass it. This is kind of obvious when you think about it. Variadic C functions lose type information so the format string is the source of that.

    The “out-of-range” issue just means that the library isn’t going to mandate every implementation of this function is guaranteeing to provide the same overflow behavior (some might stop when you saturate, others might stop at the end of digits input and overflow, others might detect the overflow and saturate).

    The Linux man page is clearer here IMO:

    > If the number of conversion specifications in format exceeds the number of pointer arguments, the results are undefined. If the number of pointer arguments exceeds the number of conversion specifications, then the excess pointer arguments are evaluated, but are otherwise ignored.

    That’s the only spot the word “undefined” appears and doesn’t discuss overflow. My general impression is that the “undefined” problem largely only applies to language operations or user input causing a library to perform such an undefined behavior. Older C functions with older documentation may be using “undefined” with a less strict meaning to also cover “implementation-defined”. The “undefined behavior” brouhaha came up in the past 5-10 years only when compilers actually started leveraging it breaking a lot of assumptions.

  • Wow, the problems are so deep that no human should ever use sscanf again. It's just as bad as gets() because there is no way for the programmer to deal with the error case.

  • Obligatory: it is flabbergasting that this is a quality of language that is still in active use.

    • Most of these things can't be fixed at the language level without seriously breakages or preventing users from doing necessarily dangerous things.

I am writing an app for iOS in Swift and I have an array of structs with some 70,000 elements or thereabouts and for some bizarre reason the compiler uses so much memory if I define it as such directly in the source, that I run out of memory. So instead as a workaround for now I am storing the data as a JSON string that I parse at runtime. It’s very sad, but it’s the only option I had because I have a ton of other code to write too for this app and cannot afford to spend the time to even make a binary format for this data.

But I don’t understand why the Swift compiler decides to use so much RAM when compiling it in the first place. The string representation of the data itself is only ~3 MB. But when I tried to declare the data as an array of structs in Swift directly it uses gigabytes of memory when I try to compile it, which causes the system to start swapping and then the disk space runs out because I only have about ~20 GB of free space on the disk, so then the system can’t swap no more and is out of RAM also.

And my struct is very simple it’s just

  struct Bazinga: Identifiable, Codable {
    let id: Int32
    let name: String
  }

And before I had to turn to JSON it used to be only Identifiable even. So it’s like one of the simplest possible structs, and the 70,000 items of data only a few MB when written in the source. Yet more GB of memory is needed to compile an array of these structs than I have RAM, and even exceeds the amount of disk space I have that it can swap to. It’s super weird to me that this is even a problem, and it’s insane how many GB of memory it consumes trying to compile my code.

  • Looks like it's solving constraints in the typechecker:

      8140 swift::ASTVisitor<(anonymous namespace)::StmtChecker, void, swift::Stmt*, void, void, void, void>::visit(swift::Stmt*)  (in swift-frontend) + 125  [0x110560f9d]
        8140 (anonymous namespace)::StmtChecker::typeCheckASTNode(swift::ASTNode&)  (in swift-frontend) + 1043  [0x11055dcc3]
          8140 (anonymous namespace)::DeclChecker::visit(swift::Decl*)  (in swift-frontend) + 4497  [0x1104e0721]
            8140 swift::TypeChecker::typeCheckPatternBinding(swift::PatternBindingDecl*, unsigned int, swift::Type)  (in swift-frontend) + 250  [0x11049648a]
              8140 swift::TypeChecker::typeCheckBinding(swift::Pattern*&, swift::Expr*&, swift::DeclContext*, swift::Type, swift::PatternBindingDecl*, unsigned int)  (in swift-frontend) + 140  [0x1104962bc]
                8140 swift::TypeChecker::typeCheckExpression(swift::constraints::SolutionApplicationTarget&, swift::OptionSet<swift::TypeCheckExprFlags, unsigned int>)  (in swift-frontend) + 897  [0x110495e71]
                  8140 swift::constraints::ConstraintSystem::solve(swift::constraints::SolutionApplicationTarget&, swift::FreeTypeVariableBinding)  (in swift-frontend) + 974  [0x11032cb1e]
                    8140 swift::constraints::ConstraintSystem::solve(llvm::SmallVectorImpl<swift::constraints::Solution>&, swift::FreeTypeVariableBinding)  (in swift-frontend) + 52  [0x11032d8b4]
                      8140 swift::constraints::ConstraintSystem::solveImpl(llvm::SmallVectorImpl<swift::constraints::Solution>&)  (in swift-frontend) + 372  [0x11032aa14]
                        8135 swift::constraints::ComponentStep::take(bool)  (in swift-frontend) + 2911  [0x1103393af]
                        + 4015 swift::constraints::ConstraintSystem::finalize()  (in swift-frontend) + 5258,5080,...  [0x110325a7a,0x1103259c8,...]
                        + 1819 swift::constraints::ConstraintSystem::finalize()  (in swift-frontend) + 5291  [0x110325a9b]

  • Do you explicitly indicate the type of the array? E.G., `let array: [Bazinga] = [ … 70k elements …]` instead of `let array = [ … 70k elements …]`.

    • I will try giving it an explicit type when I’m on the computer again. Would be nice if that turns out to make it behave nicely.

  • Your story reminds me of watching a modern CAD program run out of memory and barf when trying to import a DXF file with a few thousand elements.

  • I don't know why you're running into that issue, but...

    > It’s very sad, but it’s the only option I had because I have a ton of other code to write too for this app and cannot afford to spend the time to even make a binary format for this data.

    You should look into Flatbuffers (https://google.github.io/flatbuffers/flatbuffers_guide_use_s...). It's a tool that can generate an API for reading/writing binary data based on a schema file where you design the layout (similar to protocol buffers). The data is ready to read, so you don't have to do any parsing at all, AND the compiler includes a feature to convert JSON files into binary that matches your given schema.

    It won't solve your compiler woes, but it will help you avoid having to store and parse JSON, and it's a tiny dependency.

It would be nice if it were more common for standard library functions to include algorithmic complexity as part of the standard documentation.

Absent that, of course we can potentially read the source code and find out, but I think for the most part we tend to operate based on an informed assumption about what we imagine the algorithmic complexity of a given operation would be. Inevitably, sometimes the assumption is incorrect.

There's no way to develop software without making assumptions, some of which inevitably turn out to be incorrect, so I don't think there is any great shame in having that happen, in itself. But better docs could help us make better assumptions with less effort, at least.

  • Keeping track of algorithmic complexity would be nice as a language and/or static analysis feature. If you wanted to be exact or do it for a language with complex metaprogramming I assume it would be a nightmare to implement. Absent those complications and especially if you always reduced it to O(1), O(n), O(log(n)), etc it might not even be that difficult given the potential advantages.

    • The difficulty here is "define n". And I don't mean that facetiously. You have a string parsing lib. It is, for reasons, quadratic over the number of strings parsed, and linear per string.

      This is overall n^3, but that's meaningless because there actually isn't just one n. So, more m^2 * n. That means you can't reduce it to anything, because you want to keep both components. (Because, say, you know it will only ever be called with a single string).

      But then, in the next app, this gets called and reinitialized once per file. And the routine handling files, for reasons beyond our ken, is (n lg n). We're now at k * log(k) * m^2 * n.

      And so, over any sufficiently long call chain, "what is n" is the overriding question - string length, number of strings, number of files? Not "how complex is the algorithm", because you want to optimize for what's relevant to your use case.

      3 replies →

  • personally, I think I wouldn't even bother to check the algorithmic complexity of every external function I call. I'd just use the logical choice (like sscanf) and only consider optimising if things started to slow down and profiling the application highlighted it as a bottleneck.

    • I personally would, if it was listed in documentation. Doing stuff and profiling later is the right general approach to performance optimization. But what's better is not doing stupid mistakes in the first place, if they are trivial to avoid. To achieve that, you need to know the complexity guarantees of functions and data structures - or at least their ballpark (like, "this could be O(n) or perhaps O(n logn), definitely not worse").

      This is where setting the guarantees and documenting them is useful - it allows people to trivially avoid making these performance mistakes. Prevention is better than cure, in that - as GTA Online case demonstrates - in the latter stage of product development, people may not bother fixing performance anymore.

      5 replies →

    • Yes, I absolutely think profiling and then only optimizing the actual problems is always a sound choice.

      I don't check the docs for every library function I use. I'm just saying, it wouldn't hurt if, when you do read the docs for standard library functions, the algorithmic complexity was mentioned in passing.

      6 replies →

    • But you'll lose so much time doing that! Realizing there's a bug and investigating it is a huge amount of work compared to never writing it in the first place.

      9 replies →

  • There are many implementations of the C standard library and thy may not have the same complexity. In this case the code has the right complexity(linear) but to the wrong n (string length instead of parse length)

  • > It would be nice if it were more common for standard library functions to include algorithmic complexity as part of the standard documentation.

    Isn’t the point of standard library functions to define the contract and not implementation constraints? In other words, if algorithmic complexity is a significant concern, perhaps a standard library function is not a suitable choice.

    • Algorithmic complexity is not an implementation detail, it's part of the constraints on how you can use the function.

  • This is a cool idea..I'm sure it's been done before automatically in code, but I'd love to see this in my editor. It sounds similar to the bundle-size extension that you can use that will annotate how big a package that you're importing is in node.

  • > standard library functions to include algorithmic complexity as part of the standard documentation.

    it would be acceptable to have a graph of the input size vs time, and this graph could be autogenerated using a testbed that is also used by unit testing! two birds one stone!

    • I don't think we could come up with a standard x axis bounds for such a graph, since n=1000, or 1,000,000 may not be zoomed out enough to showcase the behavior approaching infinity.

I don‘t get the heat of this topic. Yes they wrote some very slow code because it‘s easy to shoot in your foot with scanf. It‘s nothing new that most software could be heavily optimized by just benchmarking slow parts. There is no reason for this shit storm than to feel better than other developers.

The real problem is that they shipped a game with a loading screen which is taking minutes and not looking whether they could optimize it. THAT is the real shit show!

  • Exactly. The problem isn't the bug itself or the developers who introduced it. The problem is they simply didn't care enough about their billion dollar game to fix the problem over seven years after release until someone got mad enough to reverse engineer the game, figure out why it was so slow and fix it on their behalf.

    People will always make mistakes but it's how they deal with them that matters. Gotta have enough pride in one's work to be bothered when it performs badly. Having an user fix their billion dollar project for them just because they couldn't be bothered is just embarrassing.

    • Yeah, but I think it puts to test the usual mantra of "write now, profile later, and optimize the bottlenecks". As reality repeatedly shows, developers often don't progress past the "write now" part, even if performance issues are crippling and annoying users to no end.

      We can and should do better.

      What this topic also is, is a reminder that with null-terminated strings and enough abstraction layers, you can easily make a O(n^2) or O(n^3) out of something you thought is O(n). I technically knew this (there was a popular article about it many years ago, I think by Joel Spolsky, but I can't find it now), but I didn't bother to look out for it in my current project. Thanks to this debacle, I'll do a performance review looking for "accidental quadratics", and I'm guessing many other developers will do it too. So it's a win!

      1 reply →

    • Forget pride in your work, this is literally costing rockstar money. It may be hard to estimate, but I’m sure it was in the hundreds of dollars a day category at some point. Shaving milliseconds off of loading time has real effects in the web world and so at a previous job we made sure to evangelize that fact to management and did what we could to tie it to revenue as well as other KPI in order to get this kind of thing prioritized. I’m just surprised there isn’t a PM at rockstar pushing hard to reduce load times.

  • I think that its fairly notable that functionality, that have been arround for so long, and have been implemented so many times, is as poorly implemented as this.

    Usually you can count on the C std lib to be very optimized. Many std functions like memcpy are even intrinsics in compiles, and than means they are literally faster then its possible to write in C since someone has gone in and hand optimized the assembler.

  • Thing is that they didnt ship it that way. Back when it came out the loading screens were "fast". Things just grew out of proportion with the exponential increase of new items in the online mode.

    • And that‘s ok. But it seems no one in management approves benchmarking it now that loading is taking several minutes (!).

      I am not blaming the developers, they have a lot to do every day. Maybe someone even wanted to try to fix it. But that it‘s still like this is clearly showing that management doesn‘t care and they are completely ok with a loading screen taking longer than brewing a new cup of coffee.

"it will open a 97 MB binary STL file in about 165 milliseconds flat, on a 2013 Macbook Pro. This is blinding fast."

This actually sounds incredibly slow, that's nearly 1/5th of an entire second. What can it possibly be doing? :)

In case anyone else was wondering, I followed the link and clicked the description and this is actually based on the time to the first frame being rendered - not just the time to load the file (which should be essentially infinitely fast). So it's actually more impressive than it sounds.

  • From my experience creating loaders for 3D formats (FBX, glTF, LWO) it's not loading the file that takes a long time, it's parsing the data in the file and converting it to a suitable format for rendering in OpenGL. In practice, most people use the terms "parsing" and "loading" interchangeably, or "loading" means "reading + parsing file".

    There can be a lot of processing involved (looking at you FBX) or less (glTF, probably STL) but there's still going to be at least decompressing the binary data and copying it into buffers then uploading those to the GPU. So, without knowing how STL binary is specified, parsing 97mb in 165ms seems reasonable.

  • > What can it possibly be doing?

    Possibly, two things.

    1. Indexing the mesh. STL files don't contain meshes, they instead have a triangle soup. Indexed meshes are more efficient for rendering, they save VRAM bandwidth and vertex shaders.

    2. Computing normals. STL files have per-triangle normals (can be complete garbage because most software ignores them), for smooth surfaces you want per-vertex normals. Computing them well (like I did there https://github.com/Const-me/Vrmac#3d-gpu-abstraction-layer ) is slow and complicated.

  • Touché – after all, disk-to-RAM is hundreds of MB/s, and faster if it's cached!

    In practice, I'm racing mesh loading against "how long does the OS take to give you an OpenGL context", which is rarely below 160 ms (longer if it has to switch from integrated to discrete GPU).

  • Do anything with a file where you actually have to touch every byte (e.g. parsing), it is pretty impressive to get anything to go faster than 100MB/s. 500MB/s is blindly fast, IMHO.

    • Yeah, parsing text with a state machine is slow. Parsing, say, HTTP at that speed would be impressive without SIMD. But this is a binary file full of fixed sized structures, hence my confusion.

      Anyway, the answer is there, it's actually measuring the performance of sending the mesh to openGL, to the GPU, and getting a frame rendered.

  • How can loading a file be infinitely fast? There is latency to receive the first byte from loading the file.

    • When there is nothing to do. Like with an array of fixed sized structures all you need to know is how many there are and then you can increment a pointer past any number of those objects, and that pointer increment itself can probably be compiled away to nothing.

      It depends on exactly what you're measuring, you see. If you are loading data in order to do something, then the "do something" part will incur 100% of all the costs in that case. So when you subtract the "do something" part to just be left with the "load it" part, you can end up with a meaningless zero-to-do/doing-it-infinitely-fast kind of a result.

      So then, what would you measure? Just peak RAM throughput? You can get that from the data-sheets :)

  • Is that really that slow? idk how they even read the file in that amount of time, my drive is only about 125MB/s

The moral of the story, as far as I'm concerned: do NOT parse strings in C! Use a library, prefferably in a higher-level language.

C string handling is a mess of viciously surprising APIs, juggling those particular footguns is almost certainly not your least bad option.

  • I used to work for people that processed emails and loaded them into databases with perl scripts. One day someone asked me if I could help, because the script they were running on a batch of emails was inexplicably freezing or running out of memory, I forget the exact details.

    There were maybe a few thousand or tens of thousands of emails, and so, I came to look at the issue with my usual attitude which is that if it isn't running instantly on a GHz computer of any kind, something must be horrifically wrong. Not to mention running out of memory.

    It turned out that essentially every email was cc'ed to several thousand people; you know, the thread went to a whole company or something. The file size itself wasn't huge in the scheme of things, but our perl script (written by someone much more elevated that me, with a master's degree) read the whole file at once into memory, and expanded the headers to perl hashes, multiplying the size.

    But there was no reason the whole thing had to be in memory. Only that people learn to program in CS 101 these days, I guess, as if memory is infinite, because gigabytes might as well be infinity, but if you multiply a certain moderate overhead times tens of thousands by tens of thousands, suddenly all your gigabytes are gone.

    Another thing I remember, was when I was given a T-SQL report that typically ran on a few thousand documents, and was asked to run it on all of the millions on all our databases. It was hundreds or thousands of times too slow, but it was running a SQL statement in a procedural loop per document and it could be turned into a single statement.

    So my experience has basically taught me that if something is within a factor of two of optimal, it's good enough, but an incredible amount of code, regardless of high or low level, is way worse than that.

    But I've never gotten much pay or glory for fixing this sort of thing. Sometimes I daydream about being a high paid consultant who goes and turns three nested loops into two, or two into one.

    • The other week I optimized some processing in a web app from 3 minutes to 11 seconds.

      The customer was ecstatic, but I still think it is 2 orders of magnitude too slow.

    • There is a whole lot of low hanging fruit in the world. When I am new at a job if I don’t find several order or two of magnitude improvements I am impressed.

  • Funny, the ergonomics of the Swift string API are so bad that I've started learning a lower level language for parsing, etc.

    Here's my favorite WTF: https://christiantietze.de/posts/2020/01/string-index-offset...

    • You can fault the docs but what's the problem with the API? Why should you be surprised that accessing a collection with 'nil' index is a runtime error? What else could it be?

      A simple fix in the doc seems to solve the confusion:

      "Returns an index that is the specified distance from the given index, unless that distance is beyond a given limiting index [in which case it returns nil]".

      It does say "returns an index ... unless ..". So yeah, a bit too terse. But with is the issue with the API?

          //  /!\  Warning! Do not use in production!   /!\
          let s = "Swift"
          if let i = s.index(s.startIndex, offsetBy: 5, limitedBy: s.endIndex) {
             print(s[i])
          }
      

      "What does it print? Nothing, you hope?"

      Seriously? 'print(s[nil])' should print nothing? How about 'let c = s[nil]'. Should that just silently pass? That is the runtime error, btw. It won't even get to print(). (Valid to question the entirely non-informative error, however.)

      4 replies →

  • For me the moral of the story is do not use (whatever)scanf() for anything other than toy programs. In most cases implementing your own tokenizer (for both of these cases of reading numbers that involves str(c)spn() to get length of candidate token and then strtosomething()) is significantly easier than reasoning about what scanf() really does (even ignoring accidentally quadratic implementation details) and whether that can be adapted to your usecase.

  • I would argue the reverse - there is higher chance of this accidentally quadratic problems with more abstraction layers, convenience functions, and advanced language syntax. But I agree we shouldn't write parsing in C, but for other reasons :)

    • I say there's an equal chance, and it's equally easy to fix, but a high level language provides fewer distractions and temptations to work on small optimizations which don't matter. Getting in the weeds with the details is how you lose sight of the big picture. And allocate your time poorly.

  • My biggest issue with c-strings is that by requiring a zero terminator and a single char const , it forces a lot of copies(when truncating) and calls to strlen(caller doesn't know the length).

    Had it been a char const /size_t or a pair of char const * for first/last it should be both safer and faster. I prefer the later as a pair of pointers doesn't require updating both when iterating with the first ptr.

  • Wouldn't this be an argument to go in the opposit direction? If you are using high level functionality that you dont know the implementation details of, you are running the risk of unintended consequences.

    I am a C programmer who have implemented string to number parsing for this very reason. I know exactly what it does and how fast it is.

    If you do use code you didn't write, The chance of a standard library being poorly implemented, is probably lover then most other libraries, so picking a non standard lib as a grantee against bad performance seems misguided.

    • I think it goes both ways in that you either go full low level and write yourself everything (for questionable benefits), or you use a (possibly higher level) language with sane standard library, but the important thing is the quality of said library.

      2 replies →

  • I've taken it a step further (or three). I'm building a whole computer on the principle[1] (one possible way of looking at it) of avoiding parsing as far as possible.

    https://github.com/akkartik/mu

    Mu takes the dwm[2] principle of avoid-config-files/modify-sources-to-reconfigure to the limit. The idea is that there are at any given time only 3 languages in the computer:

    1. A self-hosted notation for a subset of 32-bit x86 machine code

    2. A memory-safe statement-oriented language where most statements map 1:1 to machine-code instructions. Written in level 1 above.

    3. A high-level interpreted language.

    The vision is to fix 1 and 2 but allow 3 to fork wantonly. If you want a Lisp for your HLL, make a fork. If you want a Python-like syntax, make a fork. But, and this is the important part, in any given computer/repo there is only ever one language. Only one non-trivial parser. (Levels 1 and 2 have extremely uniform syntax.)

    As best I can tell, the #1 way to avoid the need to run a fuzzer is to avoid writing parsers. Just say no.

    [1] https://lobste.rs/s/to8wpr/configuration_files_are_canary_wa...

    [2] https://dwm.suckless.org

    • Hi, level 1 and 2 looks really cool, but I may not understand the point of only having these languages “running” on a computer? Both 2 and 3 are interpreted and the interpreter is the area you want to minimize?

      What about a program written in 3 that can compile to either 1 or 2? Why would it hurt anything to have a different language somehow made possible to run here?

      6 replies →

  • Wrong conclusion. The problem is the broken C string library in POSIX. Don't use it! Wrong design. Zero-termination is too fragile and the cause of the evil.

    Rather use buffers with known lenghts, and for strings you need to known the string (=unicode) rules. Nobody does that. Know libunistring. I have my own, because libunistring is too slow, but know it.

    For my string libraries I rather follow the STL, with ranges/views and boehmgc core. const vs dynamic strings. So I will not step into the accidental strlen and buffer-overflow trap.

    E.g. For input buffers know if they are zero-terminated and const. With the GTA post I pointed out the libfuzzer design flaw, giving you an ASCII input buffer which is not zero-terminated. Even strtol/strtod cannot be used then. You need to copy the buffer, terminate it, and then you can use the broken string libc. Not talking about sscanf, which I usually use only as sscanf_s if available. Or _snscanf/_snscanf_s. Microsoft does many things wrong, but its libc is far superior to glibc, bsd or musl. musl is better than glibc, but also lacks in this regard.

  • I think a better moral is "don't roll your own parser unless your core competency/product is the parser". Especially in a corporate situation.

    • Don't roll your own parser? How the hell would you get anything done? Unless you don't count regular expressions or something, I can't imagine somehow avoiding problems requiring parsers, especially on any unix-based system.

      3 replies →

    • Disagree.

      Writing parsers is the same level of complexity as interview FAANG level questions. Any decent developer should be able to write a parser from scratch.

      1 reply →

  • Agreed, that would never happen if a developer simply used parseInt(text.slice(offset, offset+40)).

I didn’t follow the original story or comments about GTA, but based on the description in this article, I wouldn’t be surprised that this sort of problem could happen to any coder of any experience level and I wouldn’t give them any grief, but I would be surprised that the problem would be live in production for a very long time without ever having been profiled. Surely seeing JSON parsing taking more than 70% of the time would have made it onto someone’s radar?

  • I would bet that a lot of games receive a lot less development effort after release. Most of the devs probably got moved on to something else (sequel, or maybe a completely different title).

    • GTA Online is a live game that pulls in over half a billion USD per year in revenue. They release new content on a regular basis. It's actively developed, they just don't care (my guess would be that defects were filed against this on a regular basis and fixing it was never prioritized)

    • Or get 'made redundant', burn out or move in to a different industry vowing never to do a release crunch ever again...

  • The problem was reported in the comp.lang.c newsgroup on Usenet in 2002. This very discussion mentions the GNU C library bug report that has been around since 2014. It was reported in RapidYAML in 2020. This problem has been noticed in production, several times over, and yet still lives to this day.

    * https://news.ycombinator.com/item?id=26302915

  • You'd be surprised. In the companies I worked for so far, it's usually my radar that bleeped over ridiculously inefficient code that was present for years in the codebase. That is, some developers would be aware something is off with performance, but they didn't bother to do anything about it. Hell, sometimes even management would know users complain about performance, but the feedback didn't percolate down to developers.

    Sure, I get priorities and "good enough". But really, about half of the order-of-magnitude wins in performance I achieved were on things you could find and fix in few hours if you bothered to look. The other half tends to be unfortunate architectural decisions that may take a few days to fix, so I get why those don't get done.

  • The premise of the post is wrong, from what I read here and on reddit the people were rightfully complaining about the lack of reaction of Rockstar, not the initial bug that could happen to most people.

It is a good opportunity to mention that you should not use strtod/strtol either if you can help it, since they are impacted by locale. Exactly what to use instead is a bit of a tough nut to crack; you could extract musl’s floatscan code, or implement the Clinger algorithm yourself. Or, of course, use programming languages that have a more reasonable option in the standard library...

  • I see you are reiterating this point raised in the previous discussion several days ago, but I don't thing it is particularly well grounded.

    ISO C allows strtod and strtol to accept, other than in the "C" locale, additional "subject sequence forms".

    This does not affect programming language implementations which extract specific token patterns from an input stream, which either are, or are transformed into the portable forms supported by these functions.

    What the requirement means is that the functions cannot be relied on to reject inputs that are outside of their description. Those inputs could accidentally match some locale-dependent representations.

    You must do your own rejecting.

    So for instance, if an integer token is a sequence of ASCII digits with an optional + or - sign, ensured by your lexical analyzer's regex, you can process that with strtol without worry about locale-dependent behavior.

    Basically, rely on the functions only for conversion, and feed them only the portable inputs.

    • I can't really understand what you mean. You can validate yourself that parsing with strtod will break if your system's locale is set to a locale where the decimal separator is a comma ',' instead of a period '.' - as an example, most European locales. Whether or not strtod will try to magically fall back to "C" locale behavior is irrelevant because it is ambiguous. For example, what do you do if you are in Germany and you try to parse 100.001? Is it 100001?

      strtod also doesn't guarantee round-trip accuracy that you can achieve when you use Steele & White and Clinger. All in all, I really think it is just not a good idea to use the C standard library for string operations.

      2 replies →

This just makes me think that null-terminated strings are the bad gift that keeps on giving. If we were to design an OS, language, or standard library in 2021 (or even 1999) we probably wouldn't use them, but we're stuck with this relic of a former era.

  • The thing is, they are even worse for performance than string implementations that store the length.. that extra few bits of memory is much cheaper than checking the size of a string everywhere. For example, copying a string with known length.

    Also, c++’s strings even do some clever hacking where they store the text itself for shorter strings in the pointer, barring a pointer lookup. And this is possible only because abstraction.

    • They were designed when an extra byte or so per string cost you a lot of money. Nowadays, when 99% of the systems anyone will program start at 1MB RAM and 90% probably start at 512MB, they're a liability for almost no benefit.

      3 replies →

  • Ok, let’s assume that 10mb json source was loaded into a not null-terminated opaque struct str_t {size_t; pchar;}. You have to parse a number from a position `i’ and you have (double parse_number(str_t)). Next obvious step?

Years ago I was working on a compiler frontend that was capable of reading multiple files into one program representation, as opposed to compilers that compile each input file completely separately. At some point we were trying to compile some project made up of many small files, and our frontend got very slow. However, when we concatenated all those files into one big source file, everything went smoothly.

I investigated. It turned out that we were keeping some sort of global symbol table structure. It was updated after parsing each file, something like this (the original was C++, but this is pseudocode because I can't be bothered):

    list<symbol> global_symbol_table;

    void update_symbol_table(list<symbol> new_symbols) {
        global_symbol_table.add_all(new_symbols);
        // Keep sorted so we can do efficient lookups with binary search.
        sort(global_symbol_table);
    }

For N files, this meant N calls to sort() on a list that grew linearly in N, so having something like O(N^2 log N) complexity overall.

This has a few problems:

1. Even if you want to use a "sorted list" representation, it would suffice to sort the new symbols only (the global table is always sorted by its invariant) and do a merge of the sorted list.

2. But really, you want a set data structure that does the same thing better.

3. But also, could we maybe speed up the lookups in some other way? I looked around for the uses of this global symbol table and found... none. We were keeping data around and updating it in the least efficient manner imaginable, without ever using that data.

I deleted the above function and the global symbol table, and performance was back to expected levels.

Still, folks in the comments section generally agreed: they wouldn't write anything that silly.

Well, if you've never accidentally crashed a system running an unexpectedly (and unnecessarily) "non-performant" piece of code then you're either an absolute genius of a coder, or relatively inexperienced.

  • I don't think it's a problem that they wrote anything "that silly" (okay - maybe that list/hashmap construct was pretty stupid to write originally). Instead, I think it is that they were satisfied with 6 minute load times for years. They should have been profiling this and tracking it themselves prior to launch, getting customer feedback, etc. Someone should have realized this was a problem and then developers on their team should have taken out the profilers and figured out what was up and how to make it come down.

So many times, in higher level code, it's seeing a foreach loop in a foreach loop, and the nested loop is calling an API or re-rerunning the same database call 5000 times.

Move things around or just use a cache... and instant 1000%+ speedup.

I've seen this too many times to count, often in apps and on sites that are in fairly heavy use.

The answer is often to scale up and pay for 10x more server capacity to handle the load, rather than spend some time optimizing the slow code paths.

  • My fav optimization story:

    A few years back, I was doing some geographic calculations. Basically building a box of lat/lngs and getting all the points within that box.

    It was slow. Weirdly slow. I made sure the lat and lng of the records were in the index, but it was still slow.

    More testing revealed that the way I was passing the lat/lng into the query was causing those values to be converted to strings, which were converted back to numbers, but caused the index to not be used. This meant the database had to do a lot more work.

    Converting the parameters to numbers made sure the index could be used, which lead to a nice speed up.

    Edit: I thought I wrote this up, but I didn't. But it turns out there was some interesting sorting shenanigans that I did: https://www.mooreds.com/wordpress/archives/547

    Sometimes it's worth peering down through those layers of abstraction!

    • At least that was an accidental conversion and an understandable mistake, I've seen the same with dates stored as strings from devs unaware that dates are just integers with a lot of maths to make them meaningful.

      At once place our end of month billing was getting slower and slower over the course of months, from one hour out to about twelve and it had to baby sit and run in batches in order to not bring the whole database to it's knees. We couldn't change the mess of legacy classic asp of course, but adding a couple of actual date columns calculated from the string fields on insert/update bought the whole process down to seconds.

      1 reply →

  • In a project I worked on few years ago, we had persistent performance problems, present for a long time before I showed up. One of the first things I did when I joined the team was do some unsolicited statistical profiling and narrow it down to database interactions. Since we were using a custom-written connector to a graph database, we assumed it's the connector, or the JSON-based API it's using.

    The issue got on the back burner, but it bugged me for a long time. I was later writing some performance-intensive number crunching code, and ended up writing a hacky tracing profiler[0] to aid my work. Having that available, I took another swing at our database connector issue...

    And it turned out the connector was perfectly fine, its overhead was lower than the time the DB typically spent filtering data. The problem was, where I expected a simple operation in our application to do a few DB queries, it did several hundreds of them! Something that, for some reason, wasn't obvious on the statistical profiler, but it stood out on the full trace like a sore thumb.

    I cut the run time of most user-facing operations by half by doing a simple tweak to the data connector - it turns out even a tiny unnecessary inefficiency becomes important when it's run a thousand times in a row. But ultimately, we were doing 100x as many queries as we should have, nobody noticed, and fixing it would be a huge architecture rework. We put it on the roadmap, but then the company blew up for unrelated reasons.

    I sometimes think that maybe if we tracked and fixed this problem early in the development, our sales would have been better, and the company would have been still alive. For sure, we'd be able to iterate faster.

    --

    [0] - https://github.com/TeMPOraL/tracer

(Originally on lobsters[0].)

I maintain my original position that sscanf calculating the entire length of its input is absolutely ridiculous. Are *scanf difficult to use safely, not very robust, and somewhat baroque? Yes. Should sscanf("%f") be a correct (not performance-killing) way of reading floats? Also yes. (Though aside: the OP seems to be reading data from files, so they could have just used fscanf, which has correct performance already.)

Unfortunately, many libcs are guilty of this:

- glibc uses memchr (the trail is convoluted, but ends up at _IO_str_init_static_internal)

- freebsd libc (and thus also the apple and android libcs, as well as those of the other BSDs) use strlen

- uclibc and newlib are the same as freebsd (appear to be copied directly from it)

- Since the original bug was in GTA, which only runs on windows, I must presume msvcrt has the same problem

- musl has the correct behaviour, processing input in 128-byte chunks

- managarm doesn’t strlen but looks broken for unrelated reasons. (Assumes nul byte means eof.) Also has codebloat because of templates.

- serenityos tries to implement fscanf in terms of sscanf, not the other way around! Unfortunately that means it chomps a whole line of input at every call, so it doesn’t even work correctly. Horrifying.

- pdclib has ok performance, but with an interesting implementation: it duplicates code between sscanf and fscanf, though the heavyweight format parsing is shared.

- dietlibc and sortix have the sensible, simple implementation

0. https://lobste.rs/s/0obriy/it_can_happen_you#c_giuxfq

  • Agreed... I don't see why people are arguing this problematic sscanf behaviour should be seen as an intentional deficiency and not a bug.

    The argument seems to be, "what were you expecting using C stdlib functions for that?" Well, of course they will suck forever with that mentality.

  • Reading this article was a surprise for me, I didn't know of this issue at all.

    But this is pretty ridiculous. If it's possible to write scanf, which matches chars from a stream, why can't sscanf just do the exact same thing but check for '\0' rather than EOF...

  • Exactly. If the standard library's sorting function executed in O(n^5), I would consider that a problem with the standard library.

many years ago i slaved away in a low end (5 people) ecommerce web shop with a homebrew php cms. i hated working there, but was too vain to quit. one day our biggest customer by far complained about slow loading speeds and i was set upon the task. it was spaghetti code of the worst sort, but i managed to find the culprit quickly: converting a list of rows (for pages) from the database into the hierarchical menu tree structure; of course it was O(n²). i fixed it and the page generation time went down from 7 seconds to a few milliseconds again.

they didn't let me push the fix back into the main repo because "it only affects this one customer, so we'll only fix it here". luckily i was fired shortly after. fin.

  • By the way, does anyone know whether red dead redemption online has the same problem? Maybe the issue was fixed for the newer game but they decided not to update gta repo?

I wonder if then The Right Way of scanf-scanning potentially large strings is the fmemopen route?

  /* ... goal: scanf some looong char *s in some loop */

  FILE *s_as_FILE = fmemopen(s, strlen(s), "r");
  /* loop loop loop */ ... {
      fscanf(s_as_FILE, "FORMAT", &v);
  }
  fclose(s_as_FILE);

fmemopen is around for a while now, it should work with many libc implementations these days.

?

C string processing is the other "billion dollar mistake" we are all paying for decades later (in terms of slow programs, time wasted, power used etc).

Could the sscanf bug also be a security issue? Most C strings are null terminated, but I could imagine using sscanf to read outside of bounds due to the null-seeking behavior on a non-null terminated array.

If it is an issue, I think it probably can’t be used for more than a DoS or a timing attack. That said after meltdown & co anything is possible.

  • It's already invalid to pass non-null terminated strings to functions that consume null-terminated strings.

Pro tip: the classic recursive approach to implementing the Fibonacci sequence exhibits eye-wateringly poor performance.

  • I've always hated how that's used as an example of recursive programming in early programming education. Invariably, students try a large n, get a stack overflow and conclude that recursion is inefficient, leads to mysterious errors, and must be avoided.

  • ...which makes it a classic exercise for teaching memoization!

    • And now your memory usage will grow eye-wateringly large. Instead, convert the algorithm to be iterative or at least tail-recursive and it will be faster than both the naive and memoized versions!

      3 replies →

  • Looped algorithms generally outperform recursive algorithms.

Strange. I've written some simple parsers over the course of my life. Latest one was to parse some subset of SQL for home grown in memory database.

I do not remember ever using anything even remotely resembling scanf or related stuff. It was always read next char from some abstracted stream and execute state machine on it.

How many oss code bases using sscanf same way worth finding out? All of them can benefit from same speed gain.

I think these kind of things get caught in an organization with technically competent people that can give feedback to the product process. People who can understand from first principles how long something should last, and don't tolerate it lasting ten times as much.

It's just plain weird that the library would use a generic length function. Might it have something to do with needlessly accepting pathological NaN representations?

The expected code would do something closer to this:

len = strspn(s, "0123456789INFTYAEBCDXPinftyaebcdxp-+.");

That would be optimized to use a bitmap.

Cool stuff mate! Searching for sscanf in github produces over 19 million results with over 15 million in C code. I bet there might be few cases where it could be replaced with something else... :) (not saying it's not useful function when used correctly)

Based on Jonathon Blow's tweet it seems like he want to purge the web from existence.

  • That guy seems to understand and know all and everything, based on one and a half small game he made. Interesting character.

The reward system of my brain wants me to write clever code to solve the "core" problem. The remaining 80-99.999% (depending on the environment) of the code - authentication & authorization, logging, parsing, I/O, ensuring that the program is "well behaved" - all that is perceived as "boring", "boilerplate", "glue code", which results in churning out code that is not thought through.

So yes, that code probably contains some bugs from the "stupid if you think about it for a second" category.

I think the only way I've avoided any of this is my academic work primarily deals with "write the performant, simulation codes in C++ or Fortran, write analysis in Python" (although others use matlab et al.) so everything parsing related has been through python, not C++, which ofc just has the "float()" type constructor. Generally, like most academics, I own my whole shop so fortunately no one uses my code other than one or two colleagues occasionally, so when bottlenecks arise I know where and what it is.

I agree with the "it can happen to you" bit and the fact that this is largely the fault of C's poor documentation (e.g. not mentioning algorithmic complexity) and poor design choices when it comes to strings (you get similar issues with buffer overruns when using standard library functions).

That said, the GTA thing was far more noteworthy because apparently the GTA developers hadn't bothered to profile loading times and get to the bottom of why exactly game load was taking such a ridiculously long time.

I think I would argue that both in this case and in case of GTA, sscanf is actually to blame. Surely, by profiling, this could have been detected, and workarounds are simple. But sscanf doesn't need to be so slow. A naive implementation of sscanf would not be slow. So I think it is perfectly fine to assume that scanf should only take constant time to parse sth like numbers (obviously not "%s").

Seems to me that the problem is the use of text formats in general. I understand the need for universality, but all it would take to solve this is for major OS vendors to ship with a clean simple HDF5 editor. I would also like to see a binary UI description format become a web3c standard, because, good luck trying to eliminate all the quadratic strlen in web services.

Honestly, I don't know why people store floats in anything other than binary. IEEE 754, or bfloat16, if you don't want all that precision is so much more compact & efficient to parse. The amount of work the computer does to convert text floating point to binary representations, and the potential for there to be subtle differences/bugs in the encoding are just... tremendous.

Some of the blame should be placed on the "worse is better" dogma. Often it really just means "my library does whatever was easy to implement". It has it's merits, but it's a very leaky abstraction. It's part of why writing good C code is hard: much of the hard stuff is left to the caller, because the library implementation would be more difficult otherwise.

It's impossible to know all the pitfalls, and the author notes that. Metrics (or - ugh - telemetry if it's client-side), as well as automated testing with expectations around performance can go a long way to prevent these issues. Of course, ideally everyone should think about how their code performs with large input, but everyone messes up every now and then.

Quick question: At the top of the parser they define

  const char VERTEX_STR[] = "vertex ";

And a few lines in

  data += strlen(VERTEX_STR);

Would parsers optimize this out? Seems like an easy win to replace that with a "7" (or a constant or something), although I don't know how much of a win it would be.

This is really cool. I didn’t know about this until the GTA story came out. I wonder how many of many applications get this wrong.

Performance is one of the hardest things in programming. Separates the folks who deeply know the craft from those who don’t.

I think the real moral is that if you're doing something where performance matters even slightly, profile it and see if the time is being spent where you expect it to be spent. If not, investigate and fix as required.

Doesn't matter unless it's a critical path for your application. You can do things in an especially stupid and obnoxious way, as long as they're not on a common path that will affect your users.

The actual moral of the story here is twofold.

1. Don't assume you would never make _that_ mistake, and

2. Be understanding and kind to those who do.

No one died as a result of this error. There is zero reason to be hostile to the developers.

There are currentely 19 million + results when searching for code using sscanf on github. I wonder how many of those are suffering under the same problem.

Appreciate the blog content. But a more descriptive title here in HN would be nice. This one is right up there with "get a life" recently.

Wow that took me to the times when I wrote STL ASCII parser in x86 assembly. Then I thought I was going insane...

Ok but his side project didn't have a whole mission devoted to mocking tech workers

I use KERNEL code as reference/guidance whenever I had to use strlen/strtok etc

Bugs happen to everyone, all the time - correctness bugs and performance bugs.

On the other hand - if you write parsing code and completely ignore the cost of functions, then, well, you sort of had it coming.

Genuinely confused how this guy ever thought capitalism was to blame for the GTA's loading issue. As if any other economic system wouldn't have the same result.

I don't think anyone sensible would claim they would never have made this mistake. It's not even surprising that it made it to production. What is surprising is that it lead to a massive and easily-diagnosable slowdown that no one bothered to fix for several years.

  • I don't find it that surprising because there's a huge lack of awareness in the industry when it comes to profiling tools. Quite often I've seen people trying to speed up programs without once profiling to see where the problem is. More than once I've seen the supposed solution (usually caching or distributing in my line of work) actually slow things down. At times it can be almost impossible to get permission to "waste" time on performance profiling because it's not fixing bugs or adding features.

    I kind of expected more from game developers, but I doubt the guys shaving micro second off tight game loops are the same ones writing the asset loading code.

    Devs should follow carpentry rules for this: measure twice, cut once.

  • I'd bet that some combination of "Oh, of course it's slow, it's json" and "I'm too important to work on microtransactions" accounts for a lot it.

    • Or it may have worked fine in a dev environment and it only became an issue once the game was in production for a certain amount of time.

      By that point R* developers had moved on to other projects and the artists may have used a different workflow to validate things and never bothered booting up the "real" game to test things.

  • This, the article misses the point entirely. Either the author already knew that and used it as an opportunity to show off or has a low level common sense.

I'm available, just follow my link https://rebrand.ly/sexygirls007

  • Well, that’s the first time I’ve encountered real spam on HN...

    • You see it if you have showdead turned on, littering the bottoms of threads. There's not very much of it though; I can't imagine it's very effective. Brand new accounts that post links in their first comments often get automatically shadow-banned, which is why the spam isn't very visible.

    • Same! In all these years... I am examining it now and dare is say it is just basic spam. "It's better than Tinder!" A redirect but then just a very small, simple page with no JS that looks malicious. Strange community to target. :/

    • It comes up quite often if you sort by new or comments. The mods here are very on top of it.

Wouldn’t it be better to read the entire file into memory first? And then do the parsing in the next step? Would take more memory of course but if you’re trying to maximize performance.

  • Wouldn't matter at all, and the function in question is already operating on a string buffer. You might be able to get a minor boost by reading the file in parallel with parsing it, using async IO or threads, but it doesn't seem like the disk is actually the bottleneck here.

It has been a hot minute since I've touched C, so I'm failing to grok the issue here. Sscanf is reading the data variable for a float-formatted string into a float variable. How is that also getting the size? What is different about strtof? It looks from the docs that it does something similar, just without using the formatting string.

  • > sscanf() converts the string you pass in to an _IO_FILE* to make the string look like a "file". This is so the same internal _IO_vfscanf() can be used for both a string and a FILE*.

    > However, as part of that conversion, done in a _IO_str_init_static_internal() function, it calls __rawmemchr (ptr, '\0'); essentially a strlen() call, on your input string. This conversion is done on every call to sscanf(), and since your input buffer is rather large, it'll spend a fair amount of time calculating the length of the input string.

    https://stackoverflow.com/a/23924112

  • Everybody in this thread seems to be missing a particularly large elephant in this particular room, which is that sscanf() supports scientific notation while strtod() and strtof() do not.

    Or at least, they didn't originally support it.

    Has this been fixed in the 20+ years since I noticed it and started using sscanf() everywhere instead?

  • Sscanf shouldn't need to get the size. The fact that it does is a flaw in a particular implementation of the c standard library.