How I cut GTA Online loading times by 70%

5 years ago (nee.lv)

Holy cow, I'm a very casual gamer, I was excited about the game but when it came out I decided I don't want to wait that long and I'll wait until they sort it out. 2 years later it still sucked. So I abandoned it. But.. this... ?! This is unbelievable. I'm certain that many people left this game because of the waiting time. Then there are man-years wasted (in a way different than desired).

Parsing JSON?! I thought it was some network game logic finding session magic. If this is true that's the biggest WTF I saw in the last few years and we've just finished 2020.

Stunning work just having binary at hand. But how could R* not do this? GTAV is so full of great engineering. But if it was a CPU bottleneck then who works there that wouldn't just be irked to try to nail it? I mean it seems like a natural thing to try to understand what's going on inside when time is much higher than expected even in the case where performance is not crucial. It was crucial here. Almost directly translates to profits. Unbelievable.

  • I don’t think the lesson here is “be careful when parsing json” so much as it’s “stop writing quadratic code.” The json quadratic algorithm was subtle. I think most people’s mental model of sscanf is that it would be linear in the number of bytes it scans, not that it would be linear in the length of the input. With smaller test data this may have been harder to catch. The linear search was also an example of bad quadratic code that works fine for small inputs.

    Some useful lessons might be:

    - try to make test more like prod.

    - actually measure performance and try to improve it

    - it’s very easy to write accidentally quadratic code and the canonical example is this sort of triangular computation where you do some linear amount of work processing all the finished or remaining items on each item you process.

    As I read the article, my guess was that it was some terrible synchronisation bug (eg download a bit of data -> hand off to two sub tasks in parallel -> each tries to take out the same lock on something (eg some shared data or worse, a hash bucket but your hash function is really bad so collisions are frequent) -> one process takes a while doing something, the other doesn’t take long but more data can’t be downloaded until it’s done -> the slow process consistently wins the race on some machines -> downloads get blocked and only 1 cpu is being used)

    • > actually measure performance and try to improve it

      This really rings truest to me: I find it hard to believe nobody ever plays their own game but I’d easily believe that the internal culture doesn’t encourage anyone to do something about it. It’s pretty easy to imagine a hostile dev-QA relationship or management keeping everyone busy enough that it’s been in the backlog since it’s not causing crashes. After all, if you cut “overhead” enough you might turn a $1B game into a $1.5B one, right?

      4 replies →

    • - do not implement your own JSON parser (I mean, really?).

      - if you do write a parser, do not use scanf (which is complex and subtle) for parsing, write a plain loop that dispatches on characters in a switch. But really, don't.

      8 replies →

    • > actually measure performance and try to improve it

      This reminds me that I used to do that all the time when programming with Matlab. I have stopped investigating performance bottlenecks after switching to Python. It is as if I traded performance profiling with unit testing in my switch from Matlab to Python.

      I wonder if there are performance profilers which I could easily plug into PyCharm to do what I used to do with Matlab's default IDE (with a built-in profiler) and catch up with good programming practices. Or maybe PyCharm does that already and I was not curious enough to investigate.

    • The JSON parsing is forgivable (I actually didn't know that scanf computed the length of the string for every call) but the deduplication code is a lot less so, especially in C++ where maps are available in the STL.

      It also comforts me into my decision of never using scanf, instead preferring manual parsing with strtok_r and strtol and friends. It's just not robust and flexible enough.

    • I thought the lesson is "listen to your customers and fix the issues they complain about".

  • > Parsing JSON?!

    Many developers I have spoken to out there in the wild in my role as a consultant have wildly distorted mental models of performance, often many orders of magnitude incorrect.

    They hear somewhere that "JSON is slow", which it is, but you and I know that it's not this slow. But "slow" can encompass something like 10 orders of magnitude, depending on context. Is it slow relative to a non-validating linear binary format? Yes. Is it minutes slow for a trivial amount of data? No. But in their mind... it is, and there's "nothing" that can be done about it.

    Speaking of which: A HTTPS REST API call using JSON encoding between two PaaS web servers in Azure is about 3-10ms. A local function call is 3-10ns. In other words, a lightweight REST call is one million times slower than a local function call, yet many people assume that a distributed mesh microservices architecture has only "small overheads"! Nothing could be further from the truth.

    Similarly, a disk read on a mechanical drive is hundreds of thousands of times slower than local memory, which is a thousand times slower than L1 cache.

    With ratios like that being involved on a regular basis, it's no wonder that programmers make mistakes like this...

    • Tbe funny thing is, as a long time SDET, I had to give up trying to get people to write or architect in a more "local first" manner.

      Everyone thinks the network is free... Until it isn't. Every bit move in a computer has a time cost, and yes, it's small... But... When you have processors as fast as what exist today, it seems a sin that we delegate so much functionality out to some other machine across a network boundary when the same work could be done locally. The reason why though?

      Monetizability and trust. All trivial computation must be done on my services so they can be metered and charged for.

      We're hamstringing the programs we run for the sole reason that we don't want to make tools. We want to make invoices.

      1 reply →

  • > But how could R* not do this? GTAV is so full of great engineering

    I assume there were different people working on the core game engine and mechanics VS loading. It could as well be some modular system, where someone just implemented the task "load items during online mode loading screen screen".

  • Me and my twice-a-week gaming group enjoyed GTA V but abandoned it years ago simply because of the load times. We have 2x short slots (90-120 minutes) each week to play and don't want to waste them in loading screens.

    We all would have picked this game back up in a second if the load times were reduced. Although I must say even with the same results as this author, 2 minutes is still too long. But I'll bet that, given the source code, there are other opportunities to improve.

    • I wonder if a paid subscription would have fixed this? If you left a paid MMO, they'd probably ask you to fill out an exit survey, and you could say "I'm canceling because load times are terrible", which would (hopefully) raise the priority of reducing load times. But since GTA online is "free", there's not a single exit point where they can ask "why did you stop playing".

      2 replies →

  • It gets worse they're brand new game Red Dead online does the same thing as soon as it did it the first time I logged out and charged back

This is why I come to HN, I was going to skip this because I thought it was about video games, but really glad to have read it, and loved every line of the article.

So much to get from this.

Even if you don't have the source, you can make a change if you are annoyed enough.

If you don't like something, and the source code is out there, really go contribute.

Performance matters, know how to profile and if using an external dependency, then figure out their implementation details.

Algorithms & Data structures matter, often I see devs talking about how it doesn't matter much but the difference between using a hashmap vs array is evident.

Attentive code reviews matter, chances are they gave this to a junior dev/intern, and it worked with a small dataset and no one noticed.

  • I think this is a perfect example of “algorithms and data structures emphasis is overblown.” Real world performance problems don’t look like LeetCode Hard, they look like doing obviously stupid, wasteful work in tight loops.

    • ... that's the exact opposite of what I took from this.

      The obviously stupid, wasteful work is at heart an algorithmic problem. And it cropped up even in the simplest of data structures. A constant amount of wasteful work often isn't a problem even in tight loops. A linear amount of wasted work, per loop, absolutely is.

      1 reply →

    • True that it's rare that you need to pull out obscure algorithms or data structures, but in many projects you'll be _constantly_ constructing, composing, and walking data structures, and it only takes one or two places that are accidentally quadratic to make something that should take milliseconds take minutes.

      The mindset of constantly considering the big-O category of the code you're writing and reviewing pays off big. And neglecting it costs big as well.

      3 replies →

    • And trying to optimize them gets you stink eye at code review time. Someone quotes Knuth, they replace your fast 200 lines with slow-as-molasses 10 lines and head to the bar.

      1 reply →

    • And here what matters is not your programming skills, it’s your profiling skills. Every dev writes code that’s not the most optimized piece from the start, hell we even say “don’t optimise prematurely”. But good devs know how to profile and flamegraph their app, not leetcode their app.

      7 replies →

    • leetcode style thinking will allow you to spot obviously stupid wasteful work in tight loops.

    • Exactly - though to add a little nuance to your post, it’s about having a million loops in a 10M line code base and exactly one of them is operating maximally slowly. So preventing the loop from entering the code base is tough - finding it is key.

  • I always tell a story about an application we ran, it generated its own interface based on whatever was in inventory. Someone did something really stupid and duplicated each inventory item for each main unit we sold...so you had recursive mess. Assigning 100,000 items when previously it was 100-ish

    Anyway, everyone just rolled their eyes and blamed the fact that the app was written in Java.

    It ended up generating an XML file during that minute long startup....so we just saved the file to the network and loaded it on startup. If inventory changed, we’d re-generate the file once and be done with it.

  • > Even if you don't have the source, you can make a change if you are annoyed enough.

    Well, until you get flagged by the anti cheat and get your account and motherboard banned...

  • This was probably a compiler bug. I don't think the programmers coding the business logic were using 'strlen' and 'sscanf' directly.

Honestly, while this horrible code is mildly offensive to me, I'm pretty impressed by this person's persistence. It's one thing to identify a bug in a compiled program, but it's another to patch it without fully understanding what's going on. Caching strlen was a particularly clever trick that sidestepped a bunch of more complicated solutions.

I played through GTA V, enjoyed it, and tried out the online mode afterward.

I've logged in exactly twice. Load times like that may be worth it to a hardcore gamer, but I have no patience for it. There's no shortage of entertainment available to someone with a PC, a reasonable internet connection, and a modicum of disposable income. Waste my time and I'll go elsewhere for my entertainment.

Wow, many people argue how optimized GTA was and then this. I wonder how much money they lost because of this. I often stopped playing because it just took too long to load.

  • GTA, at least the core gameplay and the single player mode, is quite well optimised. The game ran well even on the cheaper side of gaming PC hardware.

    This... this is GTA online. It's a cash cow designed to suck cash out of your pocket. Ads for things you can spend your money on are shown while "connecting", so if this delay wasn't introduced intentionally, it sure isn't a high priority fix. The code isn't part of the optimised, streamlined, interactive part of the game, it's part of the menu and loader system.

    Most of these online games/services have so-called "whales" that contribute most if not all of the income the platform makes. If these whales are willing to spend the wads of cash they throw at the platform, they won't even care for another five minutes of ads. The amounts of cash some of these people spend is obscene; the millions Take Two profit from GTA every year are generally generated by only a tiny (usually a single number percentage) of the total player base.

    In the end, I doubt they've lost much money on this. They might've even made some from the extra ads.

    • > GTA, at least the core gameplay and the single player mode, is quite well optimised. The game ran well even on the cheaper side of gaming PC hardware.

      It's easy to forget that GTA5/GTA:O was originally a 360/PS3 game, getting a game of that scope running at all on a system with just 256MB of RAM and VRAM was an incredible achievement.

      The A-Team developers who made that happen were probably moved over to Red Dead Redemption 2 though, with GTA5s long tail being handled by the B-Team.

  • GTA V (the single player game) is quite well optimized and needs a frame rate limiter on most newer systems because it will run at over ~180 fps, at which point the engine starts to barf all over itself.

    GTA Online is a huge, enormously buggy and slow mess that will generally struggle to run at 80 fps on a top-of-the-line 2020 system (think 10900K at over 5 GHz with a 3090) and will almost never cross the 120 fps threshold no matter how fast your system is and how low the settings are.

    • > at which point the engine starts to barf all over itself.

      I’m really confused as to why games are determining anything whatsoever based on the refresh rate of the screen.

      Skyrim has this same problem and not being able to play over 60fps is the reason I haven’t touched the game in years.

      4 replies →

  • Yep, sometimes I feel like having a drive around, and then I remember how long it takes to load and play something else instead. If you end up in a lobby with undesirables and are forced to switch, you've got another long wait.

  • It always fascinates me how sometimes people defend things just because they're a fan, even if the particular aspect they're defending doesn't make sense!

    I've seen this happen with some other games which are not the best optimised for PCs, but the fans will still defend the developers, just because they like the brand

    • It's part of social validation. You inherently want other people to like the things you like, because it validates your choices. This in turn means you'll defend things you like.

      Even "rebels", who supposedly relish having fringe tastes, want other rebels to approve of their tastes.

      The more strongly you stake your identity as "fan of X", the more social disapproval of X hurts.

  • I wouldn't be surprised if the long wait increases profits -- as you wait, Rockstar shows you ads for on-sale items.

  • I'm willing to bet the developers who wrote the in game store are not the same developers who optimized the rendering pipeline

  • I disagree. There is no contradiction. JSON can be a different beast for C, C++, Java, backend coders. You can implement complex 3D graphics while struggling with JSON.

    For example, my backend Java guys struggled heavily with JSON mappers. It took them forever to apply changes safely. My department consumes a lot of data from backend systems and we have to aggregate and transform them. Unfortunately the consumed structure changes often.

    While a JSON mapper in our case in JAVA was sort of exceptionally hard to handle, a simple NodeJS layer in JavaScript did the job exceptionally easy and fast. So we used a small NodeJS layer to handle the mapping instead of doing this in Java.

    Moral of the story: Sometimes there are better tools outside your view. And this seems to be many times the case for JSON. JSON means JavaScript Object Notation. It is still tough for OO languages to handle.

    This is my observation.

Also note that the way he fixed it, strlen only caches the last call and returns quickly on an immediate second call with the same string.

Another reason why C styled null terminated strings suck. Use a class or structure and store both the string pointer and its length.

I have seen other programs where strlen was gobbling up 95% of execution time.

  • Not that C strings do not suck, but with pascal strings we could discuss in this thread how implicitly copying a slowly decreasing part of a 10mb string at every tokenizer iteration could miss a developer’s attention. It’s completely unrelated to C strings and is completely related to bytefucking by hand. Once you throw a task like “write a low-level edge-case riddled primitive value shuffling machine” into an usual middle level corporative production pipeline, you’re screwed by default.

  • I'm with you. I hate null terminated strings. In my first experiences with C, I specifically hated them because of strlen needing to scan them fully. When C++ introduced string_view, my hate grew when I realized that I could have zero-copy slices all the way up until I needed to interface with a C API. At that point you're forced to copy, even if your string_view came from something that was null terminated!

  • Could this be worked into a compiler/stdlib from the back-end? Could a compiler/stdlib quietly treat all strings as a struct of {length,string} and redefine strlen to just fetch the length field? Perhaps setting a hook to transparently update "length" when "string" is updated is not trivial.

    Edit: hah, I'm decades late to the party, here we go:

    Most modern libraries replace C strings with a structure containing a 32-bit or larger length value (far more than were ever considered for length-prefixed strings), and often add another pointer, a reference count, and even a NUL to speed up conversion back to a C string. Memory is far larger now, such that if the addition of 3 (or 16, or more) bytes to each string is a real problem the software will have to be dealing with so many small strings that some other storage method will save even more memory (for instance there may be so many duplicates that a hash table will use less memory). Examples include the C++ Standard Template Library std::string...

    https://en.wikipedia.org/wiki/Null-terminated_string

    • I don't think you could do it transparently, because it's expected to pass the tail of a character array by doing &s[100] or s + 100, etc. I don't think that would be easy to catch all of those and turn them into a string fragment reference.

      From c++ class, std::string was easy enough to use everywhere, and just foo.c_str() when you needed to send it to a C library. But that may drags in a lot of assumptions about memory allocation and what not. Clearly, we don't want to allocate when taking 6 minutes to parse 10 megs of JSON! :)

Maybe even more surprising to me is that sscanf() relies on strlen().

I would have expected libc to take that use case in consideration and use a different algorithm when the string exceeds a certain size. Even if the GTA parser is not optimal, I would blame libc here. The worst part is that some machines may have an optimized libc and others don't, making the problem apparent only in some configuration.

I believe standard libraries should always have a reasonable worst case by default. It doesn't have to be perfectly optimized, but I think it is important to have the best reasonable complexity class, to avoid these kinds of problems. The best implementations usually have several algorithms for different cases. For example, a sort function may do insertion (n^2, good for small n) -> quicksort (avg. nlog(n), worst n^2, good overall) -> heapsort (guaranteed nlog(n), slower than quicksort except in edge cases). This way, you will never hit n^2 but not at the cost of slow algorithm for the most common cases.

The pseudo hash table is all GTA devs fault though.

  • > For example, a sort function may do insertion […]

    That’s generally called “adaptive”. A famous example of that is timsort.

    Your version has issues though: insertion sort is stable, which can dangerously mislead users.

I gave up on playing GTA:O because everything took so long to load, having never spent a dime. I have to imagine there is so much lost revenue because of this bug; I hate to be harsh but it is truly an embarrassment that someone external to R* debugged and fixed this before they did (given they had 6 years!).

  • Load times is absolutely the primary reason I quit playing.

    • Same here, being slow as molasses really spoiled the game for me. I wonder if the same software quality caused other unnecessary in-game loading times. I felt that time spent playing GTA:O was about 60% waiting in lobby/playing, and the remaining 40% actual play time. Same with RDR2 by the way.

      1 reply →

    • I can stomach a long load time (but that's not to excuse how godawful this load time is), just as long as it's worth it. I need some kind of resolution/ultimate goal, but in GTA:O there really isn't one; it's just grind-grind-grind so that you can get yet another car, or another house, or another thing that you already have plenty of. Lather, rinse, repeat in perpetuity.

      I just felt like every time I logged in I went, "So what's the point here?".

GTA:O shows advertisements for in-game purchases on the loading screen. How many advertisements you see is a function of how long the loading screen takes.

Something tells me this "problem" was discovered long ago at Rockstar HQ, and quietly deemed not to be a problem worth solving.

  • I was going to say surely this has extremely diminishing or even negative return past 30-60 seconds, but then I remembered lots of people are willing to sit through 10 minutes of commercials to watch 20-30 minutes of TV. So I guess for the right type of customer it works?

    • But on network TV, 8 minutes of commercials are interspersed among 22 minutes of content.

      Virtually nobody's willing to sit through 10 minutes of commercials straight.

      5 replies →

  • They either:

    Found that they get more purchases BECAUSE of the long loading time, despite bouncing other players (the ad theory and happy coincidence for them to have the ad placement slots from shitty engineering)

    The engagement team was told bullshit by the engineering team about how impossible it is to fix that issue

    Or they are just making enough not to care

  • Could be, but I can imagine people giving up on GTA: Online altogether because it takes too much time to load.

    • > Could be, but I can imagine people giving up on GTA: Online altogether because it takes too much time to load.

      Luckily for them, churn is usually a different problem solved by a different part of the team/org with different priorities and insights.

    • Yep this was me about 4 years ago. I distinctly remember sitting through the loading screen and being absolutely astonished at how long it took. I never fired it up again because I just couldn’t be bothered to wait that long.

    • I'm one that had pre-order bonus on GTA Online but gave up in part due to the loading times. If I have one hour to play I don't want to spend 10 minutes on loading screens.

      It's funny to think that I would probably pay some amount to have GTA Online without these absurd loading times and without modders/hackers :D

    • But for the ones that don't, it equals a virtual commute, so it might be a good filter and the audience beyond that watershed is more serious about spending money there. ;)

Incredible work. Hopefully R* acknowledge this and compensate you in some way. I won’t be holding my breath.

Maybe set up a donation page? I’d be more than happy to send some beer money your way for your time spent on this!

  • Agree, this is up there in the top tier of amazing stories I've read here on HN. I admire T0ST's technical and writing skills; first rate combination. Massive kudos, would like to shout a cup of coffee.

    (I also really like the design and presentation of the article; I'm running out of superlatives here.)

  • Thanks for the suggestion, probably missed most of the traffic but just added it :)

    https://buymeacoffee.com/t0st

    • Hi,

      I loved the article. Are you planning to write any guides/tutorials about reverse engineering games? Seems like you have a lot of practical experience. I (and probably many other people) would be really excited if you started writing about how you do all these in detail with practical examples. I would even be glad to pay for such content.

This is really cool - how did you develop the background knowledge to solve this? I'm trying to learn more about low-level stuff and I would have no idea how to approach solving a problem like this

"They’re parsing JSON. A whopping 10 megabytes worth of JSON with some 63k item entries."

Ahh. Modern software rocks.

  • Parsing 63k items in a 10 MB json string is pretty much a breeze on any modern system, including raspberry pi. I wouldn't even consider json as an anti-pattern with storing that much data if it's going over the wire (compressed with gzip).

    Down a little in the article and you'll see one of the real issues:

    > But before it’s stored? It checks the entire array, one by one, comparing the hash of the item to see if it’s in the list or not. With ~63k entries that’s (n^2+n)/2 = (63000^2+63000)/2 = 1984531500 checks if my math is right. Most of them useless.

    • The JSON patch took out more of the elapsed time. Granted, it was a terrible parser. But I still think JSON is a poor choice here. 63k x X checks for colons, balanced quotes/braces and so on just isn't needed.

        Time with only duplication check patch: 4m 30s
        Time with only JSON parser patch:       2m 50s

      1 reply →

Excellent investigation, and an elegant solution.

There's a "but" though: you might end up getting banned from GTA Online altogether if this DLL injection is detected by some anti-cheat routine. The developers should really fix it on their end.

  • It's highly unlikely you are going to get banned on GTA even with cheats. The anti-cheat is a joke. The game is filled to the brim with cheaters. If me and my friends play, we play with cheats just to protect ourselves from other cheaters.

    • Game cheat dev here: Just to provide some context, the GTA Online client is woefully horrible at doing client-side validation on the packets it receives from other peers. (there isn't an authoritative server)

      This means that anyone in your session can send you a weirdly-formed packet to crash your game. Most cheats have protections against this by just doing Rockstar's job and adding better validation around packet interpretation routines.

      Using "cheats just to protect [your]selves" actually makes a lot of sense.

I am absolutely shocked about this finding. The amount of money on microtransactions Rockstar lost because of this single issue must be gigantic. The amount of people that got turned off by the loading times over the years is massive. It's mind boggling.

Well, that's embarrassing. I can't even imagine the level of shame I would feel if I had written the offending code.

But, you know, premature optimization yadda yadda.

  • Probably this was written under a very strict release deadline and it worked ok at the time (less items for microtransactions). The problem lies with the management that never picked up the issue once it became a big problem. Pretty sure that any developer in R* is capable of optimizing this parser.

  • It's the kind of thing that's very easy to accidentally write, it's not that shameful. What's shameful is not investigating the load times at all, since the problem is so easy to see when any measurement is done.

    • If 63k entries with 10MB of actual data takes minutes to process on a current computer I'd consider that shameful.

      10MB is less than the cache in modern CPUs. How can this take minutes(!).

      1 reply →

  • I imagine there is a senior programmer working for another game company. They are currently kicking themselves about the poorly performing and rushed loading code they wrote while still working at R*. But there is nothing they can do about it now, since they have moved on.

  • As they say, a lot of classified stuff and closed-source code remains classified and closed not because it contains important secrets, but because those who hold the lock feel too ashamed and embarrassed to show the contents.

  • It's not a premature optimisation to use a hashset instead of a list though!

    • The bug is more devious than that. The code looks linear at a glance and the culprit is that sscanf is actually O(N) on the length of the string. How many people would expect that?

Yep, O(n^2) has the problem that no matter how fast you upgrade your hardware it would still lag.

Another pet peeve of mine is Civ 6's loading time for a saved game is atrocious. I'm sure there's a O(n^2) loop in there somewhere.

  • My personal pet peeve is Windows Update (and their products installation routine in general). I bet that it’s n^3 somewhere deep and they carefully curb than n for decades.

    • Good call. I'd love to read a post-mortem on why it was even possible for Windows XP's update check to be as slow as it was. I've definitely waited around 2 hours one time just for the check to complete after finishing an installation.

I loved so much reading it, I was thinking that if someone were to write fictional, sherlock holmes like fantasy story where our sherlock would take some (maybe fictional) widely used software at each episode, investigate it like this, and reveal some (fictional) grand bug in the end- I'd totally read it.

Yeah I know it sounds stupid, but I suspect real Sherlock Holmes was inspired by a true story like this one too, and at least some contemporary detectives started to enjoy reading them.

  • There’s no need for the examples to be fictional, there are more than enough real world cases to share. Sadly, many of my personal ones end in “I filed a bug with an annotated screenshot of decompiled code indicating where they should fix it but nothing happened”.

Reading things like these is bittersweet. One one hand, I am glad to see that the art of figuring out “why is this thing slow” is still alive and well, even in the face of pushback from multiple fronts. On the other hand, it’s clear that the bar is continually rising for people who are looking to do this as a result of that pushback. Software has always had a bottleneck of the portion of the code written by the person with the least knowledge or worst priorities, but the ability to actually work around this as an end user has gotten harder and harder.

The first hurdle is the technical skill required: there has always been closed source software, but these days the software is so much more complex, often even obfuscated, that the level of knowledge necessary to diagnose an issue and fix it has gone up significantly. It used to be that you could hold and entire program in your head and fix it by patching a couple bytes, but these days things have many, many libraries and you may have to do patching at much stranger boundaries (“function level but when the arguments are these values and the call stack is this”). And that’s not to say anything of increasing codesigning/DRM schemes that raise the bar for this kind of thing anyways.

The other thing I’ve been seeing is that the separation between the perceived skills of software authors and software users has increased, which I think has discouraged people from trying to make sense of the systems they use. Software authors are generally large, well funded teams, and together they can make “formidable” programs, which leads many to forget that code is written by individuals who write bugs like individuals. So even when you put in the work to find the bug there will be people who refuse to believe you know what you are doing on account of “how could you possibly know better than $GIANT_CORPORATION”.

If you’re looking for ways to improve this, as a user you should strive to understand how the things you use work and how you might respond to it not working–in my experience this is a perpetually undervalued skill. As a software author you should look to make your software introspectable, be that providing debug symbols or instructions on how users can diagnose issues. And from both sides, more debugging tools and stories like this one :)

Doesn't surprise me at all. It's an O(n^2) algorithm (strlen called in a loop) in a part of the code where N is likely much smaller in the test environment (in-app purchases).

Overwatch is another an incredibly popular game with obvious bugs (the matchmaking time) front and center. And gamers are quick to excuse it as some sort of incredibly sophisticated matchmaking - just like the gamers mentioned in OP.

It's easy to to say it's something about gamers / gaming / fandom - but I have a locked down laptop issued by my bigcorp which is unbelievably slow. I'd bet a dollar there's a bug in the enterprise management software that spins the CPU. A lot of software just sucks and people use it anyway.

  • I am not sure Overwatch's matchmaking time is a bug per se. The time estimates are bad for sure. But the matchmaker can really only be sure of one state -- if you queue for a match, a match will be made. The rest is predicting who will show up, along with some time-based scale for making a suboptimal match in the interest of time. Players absolutely hate these suboptimal matches, so the time threshold ends up being pretty high. The rest seems to just be luck; will the right combination of 11 other people be in the right place at the right time?

    I think it could be improved, but it doesn't strike me as being buggy.

    (Overwatch itself, lots of bugs. Tons of bugs. If they have any automated tests for game mechanics I would be pretty surprised.)

    • No, it doesn't add up. I can see dozens of groups filling up and queueing at my groups level as I wait for matches. Worse, many of my matches just aren't that evenly balanced. Even if you believe the game is dead now, things were just as bad (10+ minute queues for full 6 stacks) at the peak. They don't do tight geographic bindings - I live on the US west coast and regularly get Brits and Aussies in my games.

      I guess what they are probably doing is batching groups of games and then matching for the entire batch, to ensure nobody gets a "bad" match. What they've missed is that well - 5% of matches are bad baseline because somebody is being a jerk or quits or has an internet disconnect or smurfs or whatever other reasons. They could have picked an algorithm that gave fast matches 99% of the time at the cost of having bad matches 1% of the time and nobody would have noticed, because their baseline bad match rate is so high. Optimization from too narrow a perspective.

      Honestly, the OW matches I get aren't any more balanced that the COD matches I used to get, and I got those in a minute, not 15.

      3 replies →

  • It's certainly not unique to game consumers. People in general just blame every fault on "it's physically impossible to solve". One big reason why corporations get away with creating non stop worse and worse products.

  • The surprising part is that sscanf calls strlen behind the scenes.

not surprising - the game industry is absolutely notorious for cutting corners. didn't know they cut corners this much though.

will r* fix it? maybe, especially since some person literally did half of the work for them. But given R* is a large company, this probably wont be fixed for a long time, and GTAO is probably being maintained by the lowest bid contractor group.

They probably have all of their full time assets working on the next iteration of GTA.

  • >But given R* is a large company, this probably wont be fixed for a long time, and GTAO is probably being maintained by the lowest bid contractor group.

    They've also made just an absolute assload of money from GTA:O in spite of the godawful load times. Why bother spending the money to fix it when people are happy to deal with it and keep giving you their own cash?

  • > especially since some person literally did half of the work for them

    All of the work.

Even after cutting loading by 70% it still take a minute? I haven't played any AAA titles for a long time. But even 30s is way too long. Especially I used to play with HDD. Considering modern SSD can be 30x Faster in Seq Read and Random Read up to 200x.

Is 1 min loading time even normal? Why did it take so long? I never played GTA Online so could someone explain?

Thank god. I always suspected that those loading times were cause by some retarded implementation detail. GTA5 is not that complex to justify that kind of loading times. Even the hardware has scaled massively since their launch and it doesn't even matter.

  • It so often is. This aspect of modern computing annoys me so much - modern computers, networks and cdns are so fast nowadays that most actions should be instant. My OS should boot basically instantly. Applications should launch instantly. Websites should load almost instantly. I’ll give a pass for 3D modelling, video editing, AAA video games and maybe optimized release builds of code. But everything else should happen faster than I can blink.

    But most programs are somehow still really slow! And when you look into why, the reason is always something like this. The code was either written by juniors and never optimized because they don’t know how, or written by mids at the limit of their intelligence. And full of enough complex abstractions that nobody on the team can reason holistically about how the whole program works. Then things get slow at a macro level because fixing it feels hard.

    Either way it’s all avoidable. The only thing that makes your old computer feel sluggish for everyday computing is that programmers got faster computers, and then got lazy, and then shipped you crappy software.

    • The most infuriating response to this is “the programs are doing so much these days!” Well, yes, a chat app might do emoji and stuff now. But it’s certainly not doing 1000x the number of things…

      4 replies →

Wow. I always assumed that profiling would be part of the pre-release test processes for AAA games...

  • When it was released the game didn't have all the microtransactions so it probably took no time at all to process the JSON even with this issue.

    Then over time they slowly add data to the JSON and then this O(n^2) stuff starts to creep up and up, but the farther away from release you are, the less likely that the kind of engineers to who do optimisation and paying any attention.

    They are all off working on the next game.

  • I had heard about this giant JSON from friends in the GTA V modding community. OP's idea of what it is used for is right. My guess is that this JSON was quite smaller when the game released and has been increasing in size as they add more and more items to sell in-game. Additionally, I speculate that most of the people with the knowledge to do this sort of profiling moved on to work on other Rockstar titles, and the "secondary team(s)" maintaining GTA Online throughout most of its lifespan either didn't notice the problem, since it's something that has become worse slowly over the years, or don't have enough bandwidth to focus on it and fix it.

    It's also possible they are very aware of it and are saving up this improvement for the next iteration of GTA Online, running on a newer version of their game engine :)

    • are aware of it and are saving up

      Still much better than e.g. Ubisoft which repainted safety borders from red to blue and removed shadows in TM2 Canyon few years after release, also breaking a lot of user tracks. (If you’re not sure why, it was before a new iteration of the game)

  • More importantly how do you release a game that takes 6 minutes to load? This is why mobile gaming has the advantage. In those 6 minutes I could have played a round of a game that’s quite satisfying and put it down already. This just seems sloppy.

    • I've actually opened a game with long loading times and alt tabbed out, because I knew it would take a while. I booted up another game to play a little bit until the first game finishes loading. 3 hours later I was done playing and realized that I was supposed to play game #1.

    • It’s quite possible that there were way less items in the store on launch, or during testing. Then it could easily be overlooked. Of course no excuse to not fix it later.

      1 reply →

    • It was only one minute at first. As the game had content added, it got longer and longer and nobody cared.

  • Could be that at release the JSON was just 200kb with 1000 entries or something and this quadratic "algorithm" wasn't the slowest part of the process

  • Could it be that MMO players are just more accustomed to long load times? (Lack of loading patience is one of the reasons I don't play SWOTOR.)

    • When I used to play world of Warcraft, it never took more than 30 seconds or so to load. It got much faster over the years - when I was playing a few years ago it was more like 5 seconds from character selection to being in the world.

      Nothing like the 6 minutes people are talking about for GTA. That’s ridiculous.

  • Same. I wonder if the dev didn’t bother to fix it because they assumed profiling would identify it as a non-issue.

    • I wonder if the list was much shorter at release and wasn’t super slow on Developement systems.

I played the game on PS3 and PC. The loading time at launch for PS3 (and later for PC, albeit on SSD) wasn't great, but it also wasn't nearly this terrible.

From a game programming perspective, I'm sure at launch there were very few extras to obtain, so this method ran fast and didn't raise any red flags.

But as time has worn on they've added a ton of extra items and it's become a real problem. What it does show is that probably most of their team are working on creating new things they can sell, vs testing and maintaining the codebase for the last N years.

  • It's never been "good", I played since 2013, Xbox 360 and later repurchased the game for PS4, the online load times were not just annoying they outright broke the game for me. To be having fun and then have a many minute delay while being pinned to the screen (because after loading from a mission you're never in a safe place).

    Looking down on the ground through the clouds at the san Andreas's streets with that wispy air sound while waiting for those large thud noises which could come randomly will forever be etched into my memory as something which completely broke the fun I was having, especially when playing with friends and trying to do Heists later in the products life.

    And because of this: getting people to play was really difficult, the combination of huge updates which took hours to download (PS4 has slow access to it's drive even if you upgrade to SSD) and the insanely long loading times once you have the patch culminated in many hours of lost gameplay.

    I remember a quote from Steve Jobs which fits here: "Let's say you can shave 10 seconds off of the boot time. Multiply that by five million users and thats 50 million seconds, every single day. Over a year, that's probably dozens of lifetimes. So if you make it boot ten seconds faster, you've saved a dozen lives. That's really worth it, don't you think?"[0]

    [0]: https://www.folklore.org/StoryView.py?story=Saving_Lives.txt

The part that puzzles me the most was this comment about sscanf:

> To be fair I had no idea most sscanf implementations called strlen so I can’t blame the developer who wrote this.

Is this true? Is sscanf really O(N) on the size of the string? Why does it need to call strlen in the first place?

  • I think that the author hasn't checked them all. Even this isn't checking them all.

    The MUSL C library' sscanf() does not do this, but does call memchr() on limited substrings of the input string as it refills its input buffer, so it's not entirely free of this behaviour.

    * https://git.musl-libc.org/cgit/musl/tree/src/stdio/vsscanf.c

    The sscanf() in Microsoft's C library does this because it all passes through a __stdio_common_vsscanf() function which uses length-counted rather than NUL-terminated strings internally.

    * https://github.com/tpn/winsdk-10/blob/master/Include/10.0.16...

    * https://github.com/huangqinjin/ucrt/blob/master/inc/corecrt_...

    The GNU C library does something similar, using a FILE structure alongside a special "operations" table, with a _rawmemchr() in the initialization.

    * https://github.com/bminor/glibc/blob/master/libio/strops.c#L...

    * https://github.com/bminor/glibc/blob/master/libio/strfile.h#...

    The FreeBSD C library does not use a separate "operations" table.

    * https://github.com/freebsd/freebsd-src/blob/main/lib/libc/st...

    A glib summary is that sscanf() in these implementations has to set up state on every call that fscanf() has the luxury of keeping around over multiple calls in the FILE structure. They're setting up special nonce FILE objects for each sscanf() call, and that involves finding out how long the input string is every time.

    It is food for thought. How much could life be improved if these implementations exported the way to set up these nonce FILE structures from a string, and callers used fscanf() instead of sscanf()? How many applications are scanning long strings with lots of calls to sscanf()?

    • Addendum: There are C library implementations that definitely do not work this way. It is possible to implement a C library sscanf() that doesn't call strlen() first thing every time or memchr() over and over on the same block of memory.

      Neither P.J. Plauger's nor my Standard C library (which I wrote in the 1990s and used for my 32-bit OS/2 programs) work this way. We both use simple callback functions that use "void*"s that are opaque to the common internals of *scanf() but that are cast to "FILE*" or "const char*" in the various callback functions.

      OpenWatcom's C library does the same. Things don't get marshalled into nonce FILE objects on every call. Rather, the callback functions simply look at the next character to see whether it is NUL. They aren't even using memchr() calls to find a NUL in the first position of a string. (-:

      * http://perforce.openwatcom.org:4000/@md=d&cd=//depot/V2/src/...

      1 reply →

    • Wow. Thanks for looking.

      > limited substrings of the input string as it refills its input buffer,

      As far as I can tell, that copying helper function set to the read member of the FILE* never actually gets called in this path. I see no references to f->read() or anything that would call it. All of the access goes through shgetc and shunget, shlim, and shcnt, which directly reference the buf, with no copying. The called functions __intscan() and __floatscan() do the same. __toread() is called but just ensures it is readable, and possibly resets some pointers.

      Even if it did, that pretty much does make it entirely free of this behavior, though not of added overhead. That operations structure stuffed into the file buffer doesn't scan the entire string, only copying an at most fixed amount more than asked for (stopping if the string terminates earlier than that). That leaves it linear, just with some unfortunate overhead.

      I do find the exceedingly common choice of funneling all the scanf variants through fscanf to be weird. But I guess if they already have one structure for indirecting input, it's easy to overload that. (And somehow _not_ have a general "string as a FILE" facility, and building on top of that. (Posix 2008 does have fmemopen(), but it's unsuitable, as it is buffer with specified size (which would need to be calculated, as in the MS case), rather than not worried about until a NUL byte is reached.))

      4 replies →

    • > How much could life be improved if these implementations exported the way to set up these nonce FILE structures from a string

      That's fmemopen. Not widespread, but at least part of POSIX these days.

    • OpenBSD is also doing the same thing. It seems almost universal, unless the libc author has specifically gone out of their way to do something different!

This is why people should use commonly-available packages instead of rolling their own version of whatever dumb algorithm they think they can write. This happens all the time. Bugs have been fixed by others, but everyone is too smart to use someone else’s code.

  • Sometimes those commonly used packages end up being whatever dumb algorithm the author came up with, and nobody spends the time to verify if the package is worth it's popularity.

    • Doubt that. Popularity comes with heaps of PRs, some useful some less.

      If anyone used a JSON parser that took 4 minutes to parse a file you bet the author would know by the time the 100th user comes around.

      I had a tiny barely-used detection library that didn’t detect it correctly in a new Edge browser update. Someone complained about it within the first month. The library has 20 stars and Edge has 500 users.

      Edit: Correction, it was 9 days after the browser release.

  • seems more like a bad copy and paste situation to me. it's sort of what you get when you pay for the lowest bid for contractors.

This is some first rate debugging, and great writing to boot. I hope Rockstar sees this, fixes the bug and then pays this fella something. Great post, thanks for sharing!

How many bets they'll leave it like this on the current version and fix it for the PS5 version, to show how dramatic of a change it is between consoles?

Red Dead Redemption 2 introduced a bug where the exe would reduce the volumne on start to 19%.

Its now at least 4 month old.

So whats the problem? When you tab+alt out of fullscreen, to change your soundvolumen everytime you start the game, you have to rechange the graphics configuration as well.

I solved it by .net code i found on github which would run fur 5 minutes and would put the volumne up as soon as it ifinds the rdd2 process...

Spicy hot take: the root cause here is the awful c++ library ecosystem.

  • Yeah no. While the C++ library ecosystem is painful to use, it still doesn't justify hand-rolling a JSON parser and there are certainly high quality hash-based container implementations available too, but even the standard one should beat the one used here.

    • But there is no "standard one"...reaffirming the point that you disagreed with. C++ standard library is blatantly missing key pieces given how long the language been around.

      1 reply →

Parsing text is always expensive on the CPU, that's why it's better to often prefer binary files when it's possible.

It's also the reason the DOM is so slow, in my view.

I remember spotting a lot of fast JSON parsers around, but again, there doesn't seem to be any popular, open, flexible, binary file format out there.

Meanwhile, games are always larger and larger, machine learning requires more and more data.

There is ALWAYS money, battery and silicon to be saved by improving performance.

It is absolutely unbelievable (and unforgivable) that a cash cow such as GTA V has a problem like this present for over 6 years and it turns out to be something so absolutely simple.

I do not agree with the sibling comment saying that this problem only looks simple and that we are missing context.

This online gamemode alone made $1 billion in 2017 alone.

Tweaking two functions to go from a load time of 6 minutes to less than two minutes is something any developer worth their salt should be able to do in a codebase like this equipped with a good profiler.

Instead, someone with no source code managed to do this to an obfuscated executable loaded with anti-cheat measures.

The fact that this problem is caused by Rockstar's excessive microtransaction policy (the 10MB of JSON causing this bottleneck are all available microtransaction items) is the cherry on top.

(And yes, I might also still be salty because their parent company unjustly DMCA'd re3 (https://github.com/GTAmodding/re3), the reverse engineered version of GTA III and Vice City. A twenty-year-old game. Which wasn't even playable without purchasing the original game.)

  • > The fact that this problem is caused by Rockstar's excessive microtransaction policy (the 10MB of JSON causing this bottleneck are all available microtransaction items) is the cherry on top.

    For what it's worth, 10MB of JSON is not much. Duplicating the example entry from the article 63000 times (replacing `key` by a uuid4 for unicity) yields 11.5MB JSON.

    Deserialising that JSON then inserting each entry in a dict (indexed by key) takes 450ms in Python.

    But as Bruce Dawson oft notes, quadratic behaviour is the sweet spot because it's "fast enough to go into production, and slow enough to fall over once it gets there". Here odds are there were only dozens or hundreds of items during dev so nobody noticed it would become slow as balls beyond a few thousand items.

    Plus load times are usually the one thing you start ignoring early on, just start the session, go take a coffee or a piss, and by the time you're back it's loaded. Especially after QA has notified of slow load times half a dozen times, the devs (with fast machines and possibly smaller development dataset) go "works fine", and QA just gives up.

    • > Plus load times are usually the one thing you start ignoring early on, just start the session, go take a coffee or a piss, and by the time you're back it's loaded.

      In GTA V, when I tried to enjoy multiplayer with my friends the abysmal load times were what killed it for me.

      You actually have to load into the game world - which takes forever - before having a friend invite you to their multiplayer world - which takes forever, again.

      So both a coffee, and a piss. Maybe they fixed that now?

      25 replies →

    • Was new guy at a startup. Soon noticed that chuck Norris was in our compiled JavaScript. Turned out Someone had included the entire test suite in production deploy.

      Had been like that for nearly a year. A few minutes of work brought out client js file from 12MB to less then 1mb.

      12 replies →

    • You mention quadratic behaviours and there's probably some truth to that, but it seems to me that it's partly a C++ problem. In any other langauge nobody would even consider hacking up JSON parsing using a string function. They'd use the stdlib functional if available or import a library, and this problem wouldn't exist.

      6 replies →

    • > Here odds are there were only dozens or hundreds of items during dev so nobody noticed it would become slow as balls beyond a few thousand items.

      Might be, but this particular issue has been raised by thousands of players and ignored for *years*.

      2 replies →

    • For the online games I worked on (a few of the recent NFS games) the items database was similar to the final set quite early in production and we kept an ongoing discussion about load times.

      I really liked this article, but I am a bit surprised that this made it into production. I have seen a few instances of this type of slowdowns live for very long, but they tend to be in compile times or development workflow, not in the product itself.

    • And because a merchandising push in many games may be another 10-50 items, the first couple times the % increase is high but the magnitude is low (.5s to 1s) and by the time you're up to 1000, the % increase is too small to notice. Oh it took 30 seconds last week and now it's 33.

      Boiling the frog, as it were. This class of problems is why I want way more charts on the projects I work on, especially after we hit production. I may not notice an extra 500ms a week, but I'm for damn sure going to notice the slope of a line on a 6 month chart.

    • I heard that Chrome team had this KPI from very early on - how much time it takes for Chrome to load and it stayed the same to date. i.e. they can't make any changes that will increase this parameter. Very clever if you ask me

      7 replies →

    • It would be interesting to see what JSON library they used that uses scanf for parsing numbers. Nothing like a painters algorithm type scenario to really slow things down, but also JSON numbers are super simple and don't need all that work. That is hundreds of MB's of unneeded searching for 0s

      5 replies →

    • hmm.. the entire pricing table for Google Cloud (nearly 100k skus and piles of weirdness) was only ~2mb... seems pretty big.

    • But is quadratic the real issue ? Isn't that a developer answer ?

      The best algorithm for small, medium or a large size are not the same and generally behave poorly in the other cases. And what is small? Medium? Large?

      The truth is that there is no one size fits all and assumptions need to be reviewed periodically and adapted accordingly. And they never are... Ask a DBA.

      8 replies →

  • The popular view is that companies who write software know how to prioritise, so if a problem like this isn't fixed, it's because they've done the calculations and decided it's not worthwhile.

    I disagree. If there are no internal incentives for the people who know how to fix this to fix it, or if there's no path from them thinking fixing it could improve revenues to being assigned the ticket, things like this won't get fixed. I can fully believe the load times will result in fewer users and lower expenditure.

    I think we'll see this happen with Facebook Messenger. Both the apps and the website have become slow and painful to use and get worse every month. I think we'll start to see engagement numbers dropping because of this.

    • You have just described why I laugh anytime someone complains that government is inefficient. ANY organization of sufficient size is "inefficient" because what a large organization optimizes for (for reasons I cannot explain) cannot align with what that organization's customers want optimized.

      21 replies →

    • That the process breaks down in some cases doesn't mean they don't know how to prioritize. They clearly know how to prioritize well enough to make a wildly successful and enjoyable game. That doesn't mean no bad decisions were made over the last decade or so of development.

      Like anything else, things will be as bad as the market allows. So I'd expect monopolies to do a worse and worse job of making good decisions and companies in competitive fields to do a better and better job over time. Thus the difference between TakeTwo and Facebook, and the need for lower cost of entry and greater competition in all economic endaevors where efficiency or good decision making is important.

      3 replies →

    • > I think we'll see this happen with Facebook Messenger. Both the apps and the website have become slow and painful to use and get worse every month.

      The messenger website has been atrocious for me lately. On my high-powered desktop, it often lags a bit, and on my fairly high-end laptop, it's virtually unusable. I thought it must be something I changed in my setup, but it's oddly comforting to hear that I'm not the only one with such issues.

      6 replies →

    • > if a problem like this isn't fixed, it's because they've done the calculations and decided it's not worthwhile.

      If it ever reached the point where it had to be an item in a priority list, it's already a failure. Some developer should have seen the quadratic behavior and fixed it. It's not the type of thing that should ever even be part of a prioritized backlog. It's a showstopper bug and it's visible for every developer.

    • > I can fully believe the load times will result in fewer users and lower expenditure.

      Is GTA online still attracts new users in droves? I doubt.

      If the old users live with the loading time for years, they are likely to continue living with it. It would be nice if Rockstar fixes it, but I doubt it would be anything except a PR win.

      2 replies →

    • 1. people experiencing this issue have already bought the game, so there's little incentive here.

      2. we can be reasonably sure people will buy newer GTA installments regardless of whether this bug is fixed or not.

      but:

      3. if there's still money to be made from microtransactions this is a huge issue and would absolutely be worthwhile, imo.

    • > I think we'll see this happen with Facebook Messenger. Both the apps and the website have become slow and painful to use and get worse every month. I think we'll start to see engagement numbers dropping because of this.

      In fact, I think the iOS app for FB Messenger did get a redesign due to problems and it’s rewritten from scratch? I remember being pleasantly surprised after the big update… It became lightweight, integrates well with iOS and supports platform features.

      On the other hand, the desktop app or the website is a shitshow :-(

      1 reply →

    • I would think this would be one of the biggest revenue / developer_time changes in company history, considering how incredibly profitable online is.

  • I imagine the conversation between the programmer(s) and management went exactly like this:

    Management: So, what can we do about the loading times?

    Programmer(s): That's just how long it takes to load JSON. After all, the algorithm/function couldn't be more straightforward. Most of the complaints are probably coming from older hardware. And with new PC's and next-gen consoles it probably won't be noticeable at all.

    Management: OK, guess that's that then. Sucks but nothing we can do.

    Management had no idea of knowing whether this is true or not -- they have to trust what their devs tell them. And every time over the years someone asked "hey why is loading so slow?" they get told "yeah they looked into it when it was built, turns out there was no way to speed it up, so not worth looking into again."

    And I'm guessing that while Rockstar's best devs are put on the really complex in-game performance stuff... their least experienced ones are put on stuff like... loading a game's JSON config from servers.

    I've seen it personally in the past where the supposedly "easy" dev tasks are given to a separate team entirely, accountable to management directly, instead of accountable to the highly capable tech lead in charge of all the rest. I've got to assume that was basically the root cause here.

    But I agree, this is incredibly embarrassing and unforgiveable. Whatever chain of accountability allowed this to happen... goddamn there's got to be one hell of an internal postmortem on this one.

    • I can pretty much guarantee that there was no discussion with management like that. From experience, live ops games are essentially a perpetually broken code base that was rushed into production, then a breakneck release schedule for new features and monetization. I've personally had this conversation a few times:

      Programmer: Loading times are really slow, I want to look into it next sprint.

      Management: Feature X is higher priority, put it in the backlog and we'll get to it.

      16 replies →

    • Alternatively (from my experience):

      Programmer(s): Can we set aside some time to fix the long loading times?

      Management: No, that won't earn us any money, focus on adding features

      1 reply →

  • The old maxim of "Premature optimization is the root of all evil" has over time evolved to "If you care one iota about performance, you are not a good programmer".

    • That belief is getting a bit outdated now that computing efficiency is hitting walls. Even when compute is cheaper than development, you're still making a morally suspect choice to pollute the environment over doing useful work if you spend $100k/yr on servers instead of $120k/yr on coding. When time and energy saved are insignificant compared to development expense is of course when you shouldn't be fussing with performance.

      I don't think the anti-code-optimization dogma will go away, but good devs already know optimality is multi-dimensional and problem specific, and performance implications are always worth considering. Picking your battles is important, never fighting them nor knowing how is not the trick.

      1 reply →

    • And fwiw, the full quote is:

      We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

      Yet we should not pass up our opportunities in that critical 3%.

      20 replies →

    • That doesn't really apply here. I don't even play GTA V but the #1 complain I've always heard for the past 6 years is that the load times are the worst thing about the game. Once something is known to be the biggest bottleneck in the enjoyment of your game, it's no longer "premature optimization". The whole point of that saying is that you should first make things, then optimize things that bring the bring the most value. The load time is one of the highest value things you can cut down on. And the fact that these two low hanging fruit made such a big difference tells me they never gave it a single try in the past 6 years.

      6 replies →

    • The problem here isn't a lack of optimization, it's a lack of profiling. Premature optimization is a problem because you will waste time and create more complex code optimizing in places that don't actually need it, since it's not always intuitive what your biggest contributors to inefficiency are. Instead of optimizing right away, you should profile your code and figure out where you need to optimize. The problem is that they didn't do that.

      1 reply →

    • I think this part of the Knuth's quote is central:

      > Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered.

      And he is explicitly advocating for optimizing the critical part:

      > Yet we should not pass up our opportunities in that critical 3%.

      And somehow, people have latched onto the catchphrase about "early optimization" that was taken out of context.

    • I went back and had a look at that maxim a few years ago and found that it actually doesn't say what many people claims it says. And definitively not as the blanket excuse for slow code that it has always to some degree been used as.

      The reason for the misunderstanding is that the kinds of practices it actually talks about are uncommon today. People would often take stuff written in higher level languages and reimplement them in assembler or machine code. Which makes them more time-consuming to change/evolve.

      It also isn't like it is hard to figure out which part of a piece of software that is taking up your runtime these days. All worthwhile languages have profilers, these are mostly free, so there is zero excuse for not knowing what to optimize. Heck, it isn't all that uncommon for people to run profiling in production.

      Also, it isn't like you can't know ahead of time which bits need to be fast. Usually you have some idea so you will know what to benchmark. Long startup times probably won't kill you, but when they are so long that it becomes an UX issue, it wouldn't have killed them to have a look.

    • Back in the day when people talked about premature optimization it was about trivial things like people asking on stackoverflow whether a++, ++a or a += 1 is faster. It obviously is a net loss since ultimately it literally doesn't matter. If matters to you, you are already an expert in the subject and should just benchmark your code.

  • It's not just believable, but it's normal. I have spent quite a bit of my career maintaining software, and I don't recall one employers where low hanging fruit like this wasn't available everywhere.

    The problem is not that developers can't optimize things: you will find some developers capable of figuring this problem out anywhere. What makes this low hanging fruit so popular is the fact that we aren't measuring enough, and even when we do, we aren't necessarily prioritizing looking into things that are suspiciously slow.

    In the case of this example, the issue is also client-side, so it's not as if it's costing CPU time to Rockstar, so it's unlikely you'll have someone who can claim their job description includes wondering if the load times are worth optimizing. When problems like this one get solved is because someone who is very annoyed by the problem and either convinces a developer to even look into the problem. Most of the time, the people that suffer, the people that get to decide how to allocate the time, and the people that are in a position to evaluate the root cause of the problem never even get to talk to each other. It's the price we often pay for specialization and organizations with poor communication.

    Organizations where the people deciding what has to be done next, and where the company culture dictates that the way forward is to either complete more tickets faster, or find ways to be liked by your manager, are not going to be fostering the kind of thinking that solves a problem like this one, but that's a lot of what you find in many places. A developer with a full plate that is just working on the next feature isn't going to spend their time wondering about load times.

    But instead we end up blaming the developers themselves, instead of the culture that they swim in.

    • Hear hear, we should make a punch a dummy manager/BA/Code standards 'lead' day...

      This code looks like someone with almost no experience hacked it together but because they were an intern and likely Rockstar is a toxic place to work, it never gets prioritized to be fixed.

      I think if managers prioritized cycle time, metri s more, they'd find that they are encouraging a lot of practices which lead to horrible efficiencies - "measure twice cut once" is a positive mantra which leads to more solid designs with less bugs.

      Agile sort of addressed this problem but unfortunately only at small size scales. Iteration and story capacity got overprioritized over quality, customer engagement, and self-leading teams.

      Plus things such as scaled agile suffer from the oxymoron of planning fast iteration - if you have a master plan you lose the ability to respond to change and iterate, or you allow iteration and you must accept if any team iterates the whole team must discard the plan...which at some point means you either accept high cycle times or you figure out a way to decouple functionality to the extent the planning becomes the standard fallacy of waterfall - wasting meeting time to go over a plan that isn't based on anything.

  • I suspect that the core engine programmers moved onto other projects long ago, leaving GTA:O running with mostly artists and scenario designers to produce more DLC.

    This bug wouldn't present in the first couple years with the limited amount of DLC, so by the time it got ridiculous there wasn't anyone left with the confidence to profile the game and optimize it. A junior dev could fix this, but would probably assume that slow loads are a deep complex engine problem that they won't be able to fix.

    Alternatively, management would declare that there's too much risk doing more technical engine work, and not sign off on any proposed "minor" optimizations because it's too risky.

    • > This bug wouldn't present in the first couple years with the limited amount of DLC

      GTA Online loading times have been infamous for a very long time. They were already unjustifiably bad when they released the game for PC, and at that point engine programmers would surely be involved.

      1 reply →

    • This is very much the likely scenario. The money is in further DLC. The existing GTAO engine is "done" from their perspective.

      I'd guess also that the next version of the base engine is in RDR2 or later and doesn't have these issues. But at the same time they likely wouldn't backport the changes for fear of cost overruns.

  • Something I've noticed in highly successful companies is that problems never get fixed because the sound of the money printer from the core business is deafening.

    Our customer portal loads 2 versions of React, Angular, Knockout and jQuery on the same page? Doesn't matter, it's printing billions of dollars.

    Rockstar's money printer is so loud that they don't care about problems.

    Same thing for Valve, their money printer is so loud that they barely bother to make games anymore and let the Steam client languish for years (how did they let Discord/Twitch happen?).

    • >Same thing for Valve, their money printer is so loud that they barely bother to make games anymore and let the Steam client languish for years (how did they let Discord/Twitch happen?).

      Not sure that's a fair criticism.

      Alyx was widely praised. Artifact... Wasn't. I don't know about Dota Overlords. And that's just the last couple of years.

      They've also developed hardware like SteamLink, Steam Controller, some high-end VR gear...

      They develop a LOT. They just don't release a while lot.

      I agree there should be a lot more work and effort in the client. And they constantly fuck up handling their esports.

      But I don't think "barely bother to make games anymore" isn't one of them.

      2 replies →

  • is it really unbelievable? companies this big tend to prioritize hiring a shit ton of middlemen (VPs, project managers, developer managers, offshore managers) in order to avoid paying out for talent to build and constantly maintain the project. I guess paying a shit ton of money to 1 person to manage 10+ poorly paid contractors works out for them, accounting wise.

    If one really examined the accounting for GTAO, I would bet that most of the billions of dollars that were earned in micro transactions went to marketing, product research, and to middle management in the form of bonuses.

    • Even if you view this as a business decision rather than a technical one, any smart project manager would realise a 6 minute loading time literally costs the company millions per year in lost revenue. (How many times have you felt like firing up GTA Online only to reconsider due to the agonising load time). I would guess this was simply a case of business folk failing to understand that such a technical issue could be so easily solved plus developers never being allowed the opportunity to understand and fix the issue in their spare time.

      13 replies →

    • It's kind of hard to believe. GTA5's online mode is their cash cow, and 6 minute load times are common?! It's kind of amazing people even play it with those load times. It's such a simple problem that one dev could have found and fixed it within a day.

      13 replies →

    • I am always amused by comments like this. You have no idea what development practices they follow (neither do I) but it's hilarious to read your tone.

      GTA has achieved tremendous success both as an entertaining game and as a business. It's enjoyed by millions of people and generates billions in revenue. As per this article, it has startup problems (which don't seem to actually really hurt the overall product but I agree sound annoying) but the bigger picture is: it's a huge success.

      So - Rockstar has nailed it. What exactly is your platform for analyzing/criticizing their processes or even having a shot of understanding what they are? What have you build that anyone uses? (not saying you haven't, but.. have you been involved with anything remotely close to that scale?)

      And if not, whence he high horse?

      8 replies →

  • Tell that to Valve. The Source engine and all its games (Half Life 1, 2, Portal, Alyx) have horrible load times. They might not be as bad as the GTA example but they're long and extremely frustrating.

    And yet, no one cares, Those games (and GTA5) all sold millions of copies.

    The only way this stuff gets fixed is if (a) some programmer takes pride in load times or (b) customers stop buying games with slow load times.

    (b) never happens. If the game itself is good then people put up with the load times. If the game is bad and it has bad load times they'll point to the load times as a reason it's bad but the truth is it's the game itself that's bad because plenty of popular games have bad load times

    Also, any game programmer loading and parsing text at runtime by definition, doesn't care about load times. If you want fast load times you setup your data so you can load it directly into memory, fix a few pointers and then use it where it is. If you have to parse text or even parse binary and move things around then you've already failed.

    • I think there may sort of be another thing going on: Basically, that the length of load time is an indicator, "This is a really serious program." I've sort of noticed the same thing with test machines: the more expensive the machine, the longer it takes to actually get to the grub prompt.

      Six minutes is probably excessive, but having GTA take 1-2 minutes to load almost certainly makes people feel better about the money they spent on the game than if it loaded up in 5 seconds like some low-production 2D adventure game.

      1 reply →

  • It was probably fast 10years ago when the store had couple of items, the dev back then never thought that it would grow to 60k items. Classic programming right there.

    As for profiling, Windows Performance Toolkit is the best available no?

    • Meh. It's ok to assume low number of items and code accordingly. What is not ok is for the company to ignore such a problem for years, instead if detecting and fixing it.

  • I've worked a number of places where the engineering culture discourages any sort of fishing expeditions at all, and if I weren't so stubborn everything would take 2-4x as long to run as it does. I say engineering culture, because at 2 of these places everyone was frustrated with engineering because the customers wanted something better, but the engineers would point at flat flame charts, shrug, and say there's nothing that can be done.

    Bull. Shit.

    There's plenty that can be done because there are parts of a process that don't deserve 15% of the overall budget. The fact that they are taking 1/6 of the time like 5 other things is a failure, not a hallmark of success. Finding 30% worth of improvements with this perspective is easy. 50% often just takes work, but post-discovery much of it is straightforward, if tedious.

    My peers are lying with charts to get out of doing "grunt work" when there's a new feature they could be implementing. But performance is a feature.

  • >It is absolutely unbelievable (and unforgivable) that a cash cow such as GTA V has a problem like this present for over 6 years and it turns out to be something so absolutely simple.

    Having played the game, it's not surprising to me in the least.

    I have never yet encountered another such 'wild-west' online experience.

    It's the only game that is so un-modereated that i've ever played where the common reaction to meeting a hacker that is interested in griefing you is to call your hacker friend white-knight and ask him to boot the griefer-hacker from the lobby.

    Reports do next to nothing -- and 'modders' have some very real power in-game, with most fights between 'modders' ending in one of them being booted to desktop by the other exploiting a CTD bug (which are usually chat text-parser based..)

    On top of all this, Rockstar attempts to have an in-game economy , even selling money outright to players in the form of 'Shark Cards' for real-life currency , while 'modders' (hackers) easily dupe gold for anyone that may ask in a public lobby.

    This isn't just all coincidence; the game lacks any kind of realistic checks/balances with the server for the sake of latency and interoperability -- but this results in every 13 year old passing around the Cheat Engine structs on game-cheating forums and acting like virtual gods while tossing legitimate players around lobbies like ragdolls -- meanwhile Rockstar continues releasing GTA Online content while ignoring playerbase pleas for supervision.

    It's truly unique though -- an online battlefield where one can literally watch battles between the metaphorical white hat and black hat hackers; but it's a definite indicator of a poorly ran business when vigilante customers need to replace customer service.

    Also, an aside, most 'mod-menus' -- the small applets put together using publicly available memory structs for game exploit -- most all have a 'quick connect' feature that allows hackers to join lobbies much faster than the GTA V client usually allows for. This feature has existed for years and years, and I believe it performs tricks similar to those listed in the article.

    • >Reports do next to nothing -- and 'modders' have some very real power in-game, with most fights between 'modders' ending in one of them being booted to desktop by the other exploiting a CTD bug (which are usually chat text-parser based..)

      Interesting, back in the day the coolest CTD I did was to simply crank up the combo multiplier in S4 League to crash the game client on every player in the room except mine since that game was peer to peer and thus any form of hacking (teleportation, infinite melee range, instant kill, immortality, etc) was possible. The combo multiplier was set to 256 and thus every single particle was duplicated 256 times and this caused the game to crash.

  • > This online gamemode alone made $1 billion in 2017 alone.

    There's the answer right there. They figure it's making $1B/yr, leave it alone. Maintenance? That cuts into the billion. Everyone moved onto the next project.

  • I stopped playing GTAV online a few years back because of the crazy load times, not only that but you have to go through 6+ minute load screens multiple times in many sessions.

    This oversight has cost them millions of dollars easy.

    • If it made over $1b in a year previously, and had such insane load times, its very plausible this bad coding has cost them north of another $1b.

      Probably ranks pretty highly up there in terms of damage to company financials, due to a lack of care.

  • In my experience most engineers have never used a profiler even once. They write the code, and if you're lucky they get it working correctly.

    • Let's call them "code technicians" instead of engineers, ok? (that's a euphemism for "code monkeys")

  • > the reverse engineered version of GTA III and Vice City

    Ohhh. Thank you for telling me about this. I just found a mirror and successfully built it for macOS. Runs so much better than the wine version. But I guess I'll never finish that RC helicopter mission anyway lol

  • “worth their salt” is doing a lot of work here. No true Scotsman fallacy?

    I think you might be surprised by how few programmers even know what a profiler is, let alone how to run one.

    • That seems like a misapplication of the fallacy. If we assume 'worth their salt' is a synonym for 'good', then saying any good developer can operate a profiler is entirely reasonable.

  • I used to play this game a lot on PS4. I actually dropped it due to the ridiculous loading times... I still assumed it was doing something useful though. I can't believe they wasted so much of my time and electricity because of this. Even cheaply-made mobile games don't have bugs like this.

    > their parent company unjustly DMCA'd re3

    Wow, this is EA games level scumbaggery... I don't think I'm gonna buy games from them again.

  • > salty because their parent company unjustly DMCA'd re3

    Unjustly, but legally. The people you should be salty at are the lawmakers.

  • > obfuscated executable loaded with anti-cheat measures

    I'm impressed that gamecopyworld.com is still online, updated, and has the same UI that it did in 2003

  • Not very surprising. Twitter doesn't work properly on my desktop, google freezes when showing the cookiewall, github freezes my phone. These are all important projects of billion dollar companies.

  • > It is absolutely unbelievable [...] that a cash cow [...] has a problem like this

    Likely it wasn't fixed precisely because it's such a cash cow. "It's making money, don't fuck with it".

  • Maybe long load times are advantageous? Because it creates longer user sessions on average? If you devote 10 minutes to loading the game you will probably want to play for at least 30 minutes.

  • Does anyone have a link to a copy of re3? Iirc, there was a gitrepo that kept a copy of all DMCA'd repos

    • try the hacker news search (bottom of the page) and you'll find stories on the takedown where there are links to backups posted in the comments.

  • I am not saying it is the case, nor understand details of solution in depth to comment on it, but in analogy, this reads to me, like an yelling at person who figure out how to solve rubix cube puzzle steps, because once steps are known solution is simple.

    • No, other people have pointed this out, this should have been very easy to recognize as inefficient in the source code. More likely the code was written hastily and only tested against very small inputs, and then nobody ever actually tried to improve the famously long load times.

      1 reply →

  • > This online gamemode alone made $1 billion in 2017 alone.

    which of course goes to show that at least from a business side, this issue is completely inconsequential and all resources should be used to push for more monetization (and thus adding to the problem by adding more items to the JSON file) rather than fixing this issue, because, clearly, people don't seem to mind 6 minutes loading time.

    I'm being snarky here, yes, but honestly: once you make $1 billion per year with that issue present, do you really think this issue matters at all in reality? Do you think they could make $1+n billion a year with this fixed?

    • The bigger the scale the bigger a few percentage point improvement would be worth. I would generally think if you're at 1bn in revenue you should devote 1%+ percentage points of your workforce towards finding low hanging fruit like this. If 1% of employees deployed to find issues that, when fixed, yield 2% improvement in revenue thats likely a winning scenario

    • This is losing them money. If they fixed the issue they absolutely would get $1+n billion instead of just $1 billion and that n alone is big enough to pay multiple years worth of 6 digit salaries just to fix this single bug.

  • I work in a large multi billion company and we have people staring at a slow problem for a decade before a noob comes with a profiler and find they browse every key of a Map instead of calling get and such. Or do 1 million db queries on a GUI startup...

    Not surprised they didn't bother for 6 minutes when it takes us 10 years to fix a 30minutes locked startup.

  • I find it absolutely believable that a for-profit company does not prioritize fixing a game that is already a cash cow anyway.

  • > It is absolutely unbelievable (and unforgivable) that a cash cow such as GTA V has a problem like this present for over 6 years

    Agree. I found the slow behavior of sscanf while writing one of my first C programs during an internship^^ You literally just have to google "scanf slow" and find lots of information.

  • It could be that at the time GTA online first publishes, the list as hashmap isn't too much of an issue, due to limited catalog, but get progressively worse as the inventory grows.

    Ofc this is just a hypothesis, but I see the hesitation to change legacy code if it ain't broken as a wide spread mentality.

    • > I see the hesitation to change legacy code if it ain't broken as a wide spread mentality.

      Load times measured in double digit minutes on a significant number of machines meets absolutely every reasonable definition of "broken".

  • Maybe it was outsourced. I don't understand how a team could make such an excellent game and fail to resolve such a simple bottleneck.

  • Focus is on the money, not on the technicals, on the production side. It is a game, to entertain and ultimately “waste time” on the consumer side. Also on the topic of parsing, +1-11gbyte/sec is possible if you go binary. Point is: this isn’t a technical problem, it’s a choice.

  • Well this sprint we didn't release any new features but we reduced the load.... Dammit hackernews!

  • What is complicated about it is that an online modern 3d game is huge, and there are 95,000 places where a dumb mistake could hurt performance a lot for some customers. You catch 94,999 of them and then "unforgiveable"

    • If it was that way for a few months and then fixed... still pretty shoddy but sure. However, it has been that way for YEARS, and is one of the most common complaints among players. I wonder how much of the remaining loading time could actually be shaved off if someone with the source code took a crack at it.

  • > It is absolutely unbelievable (and unforgivable) that a cash cow such as GTA V has a problem like this present for over 6 years and it turns out to be something so absolutely simple.

    It is both believable and - by virtue of the fact that, as you said, the series continues to be a cash cow - is apparently forgivable.

    Here's the thing: the company has zero reasons to fix this, or other ostensibly egregious affronts like DRM, because gamers keep buying the product. There is literally no economic incentive to 'fix' it.

  • Attitudes like yours are why gamedevs keep to themselves.

    "Unbelievable" and "unforgivable" eh? It's a greedy attitude. Instead of viewing GTA5 as a success that's brought a lot of people happiness, you view it as a money cow designed to extract every last bit of profit – and time, since this bug caused 70% longer loading times.

    Perhaps it's both. But you, sitting here behind a keyboard with (correct me if I'm wrong) no gamedev experience, have no idea what it's like on a triple-A gamedev team with various priorities. The fact that the game works at all is a minor miracle, given the sheer complexity of the entire codebase.

    The fact that someone was able to optimize the obfuscated executable is a wonderful thing. But they weren't a part of the team that shipped GTA 5. If they were, they certainly wouldn't have been able to spend their time on this.

    • This kind of excuse making is one of the reasons I got out of software development. It’s not just gamedev. Priorities are way out of wack when you have time to put in binary obfuscation, but no time to fix such a huge performance bottleneck. The idea that “it’s a miracle software works at all” demonstrates the chronic prioritization and project management competence problem in the industry.

      It’s ok to recognize a thing as a business success but a technical failure. In fact many software projects are business successes despite awful and unforgivable quality compromises. You don’t get to whitewash it just because the thing prints money.

      16 replies →

    • GTA-5 broke even within 48 hours of it's release. Nearly a decade later, it still costs $60 for a digital copy with (practically) zero distribution costs. It has made over $6Bn in revenue, and is said to be the most profitable entertainment product of all time.

      How much would it have cost to fix this issue?

      Is anyone saying that it is a game developers fault? I mean, what is that you think would prevent a game developer from fixing this?

      Because I think, anyone even vaguely familiar with the software industry in general is going to come up with answers like:

      1. It would not cost very much 2. No it isn't a developers fault, because it's clear that even an intern could fix this 3. Management isn't interested, or is too disorganized, or focussed on cynical extraction of every last bit of profit.

      And from that perspective, it certainly does make it seem like a cynical cash cow.

      I don't know many game developers, but I do know people in other parts of the software industry and professionals in general. And I think that they keep to themselves because they have first hand experience of how the industry works and understand it better than anyone. The probably sympathise with the right of the public to feel ripped off.

      That said, I still paid for the game, I think it's fun. Apparently there is "no alternative" to this state of affairs.

      2 replies →

    • I don’t think the OP was specifically calling out any game devs. Any engineer who has worked on any software projects knows that you usually can only do as well as your deadlines and managements priorities allow for.. Unless the stars line up and you have the resources and autonomy to fix the issue yourself, and love working for free on your own time.

      2 replies →

    • I did mean to reply to you (72 days ago plus in another thread) I just can't make a Twitter, like I've tried and doesn't like my IP or email addresses or something, not sure

      1 reply →

Imagine all the time that people are wasting if they are sitting there waiting for it to load. I wonder how many lifespans it adds up to?

  • I wonder about the environmental impact of CPU cores churning away uselessly on so many machines so many times. Probably enough to power a home for a few years?

There’s one thing that made me curious. The author said the load times when the game came out were awful as well.

So has that JSON always been 10mb or has it grown a lot over time?

Has the load time crept up since launch or has it gone down?

As others have pointed out, this is a good illustration of why you need accurate data during development. I bet that the data set used during development was dirty with duplicates and way too small. It was faster and more convenient to just code defensive garbage than to be the annoying one nagging to another team about the state of data. So this was written, probably along with a TODO comment, and then forgotten about, and ultimately shipped. I've done this same thing. Not with any real consequences, but still. It's what happens when time is of the essence.

How it remained undetected for so long is really weird though. Surely they must've had a massive amount of complaints about the loading times. I completely stopped playing because of them. How could they not have investigated the root cause, atleast once, in six years?

  • It seems that TODO comments just don't cut it. Either you create a bug/task for it, or you just forget about it.

Super impressive. I am generally not impressed with community hacks because, as every comment thread discusses, they are generally hacks that do not support all players and thus can’t be shipped.

But this seems pretty damn clean. And it’s egregious that no one at R* took a week to investigate and fix “correctly”.

Hey T0ST, thanks a mill for releasing this! I've reduced my online loading time from 120s to 40s thanks to this! Since I have the epic version, I used BigBaseV2-fix where datlimabean04 has adapted (and dare I say slightly improved?) your code. If you like, you could also link to the repo where he has integrated that: https://bitbucket.org/gir489/bigbasev2-fix/commits/branch/lo...

Also the post where he mentions adding the branch with your fix is here: https://www.unknowncheats.me/forum/3078353-post937.html

Why in the world would you roll your own JSON parser?

For reference, I just ran a 10MB file through the JSON parser I use in Python and it finished in 0.047s (vs 190s)

  • I wonder if there's some corporate policy against using external libs. You'd think most of them would have solved this.

Very cool analysis. Its unbelievable that this is still unfixed. I don't play GTA anymore mostly because of the load times for every single action.

A lot of comments here kind of brush off the fact that this has been going on for _years_! I would expect quadratic behaviour to slip here and there in non real time code places. We make mistakes after all.

I've had to revisit production code from time to time to discover my or my colleague assumptions about library functions were not actually true.

But for someone in the possession of the source code this looks like _minutes_ worth of profiling time! With a stellar technical accomplishment that GTA5 truly is, you would expect tons of really talented devs that could have traced this one down on their lunch break.

The fact that they didn't speaks volumes about organisational structure within R*. I think some higher up managers _really_ need some stern talking to!

Its a bit disheartening to think about the person-centuries of time wasted on what wouldn't be unfair to characterize as sloppy code.

I always cringe when I come across any kind of loop with unnecessary polynomial or even exponential complexity, even when it is found in a non-critical part of the code.

It offends my taste as a software engineer and I always fix these.

I haven’t played GTAO in a couple years, but I vaguely remember using the suspend process option of windows task manager to jump into online much faster.

Googling gave me this result, which sounds about right for what I remember. https://www.gfinityesports.com/grand-theft-auto/gta-online-h...

I’m not certain it’s still a valid workaround, and it’s not nearly as sophisticated as the OP method, but at least everyone can do it :)

  • I had to use this trick to kick GTA Online back into action when it got stuck forever on entering buildings (mid-2020). Didn't know it could be used to cut the load times in general.

I have absolute respect for people who are able to do that kind of things. A few years back I've tried to play with reverse engineering but I never managed to get the skills.

Pretty strange that nobody at Rockstar bothered to run the games loading phase in a profiler, this problem would have been super easy to observe if they had.

Please don't compute speed-up like that.

Old time: 6 minutes (360 seconds). New time: 1 minute and 50 seconds (110 seconds). Speed-up calculated by article author: 1-110/360=0.694 (69.4% saved). Speed-up calculated by me: 360/110=3.27 (3 times faster).

Please calculate it the other way around. It makes great difference when you say you made something 10× faster than when you say you saved 90% of work even if both mean exactly the same thing.

Bruce Dawson has great article about this: https://randomascii.wordpress.com/2018/02/04/what-we-talk-ab...

  • I don't think this remark is warranted here. Your sentence ("3.27 times faster") is clear, but conveys the same meaning. His sentence ("loading times are cut down by 70%") does not refer to "speed" or "fast" (Ctrl+F the blog post to check for yourself) and is technically correct and clear. So the criticism raised in the WordPress article (about the usage of the words "speed" and "fast") does not apply. The mistake would have been to talk about "70% faster" or a "70% speed-up", because then there is ambiguity for the reader.

  • That's how you know he's a pure engineer - he used math not marketing

    Agreed however that this is 3x faster.

It doesn't explain some things to me. Maybe I read the article inattentively, but which stage of the loading hangs for a very long time? Loading screen, or city overview when joining a session? In my experience, on modern hardware, the loading screen goes very quickly (a minute or two).

What's worth noting is that the city overview can hang for 20-30 minutes during connecting. At the same time, suspending the GTAV.exe for 20-30 seconds allows you to immediately throw the player into an empty public session. It is unlikely caused by slow parsing of json, more like the suspend causes udp timeout when connecting to the slow session hosts.

One of the most entertaining post I read lately. Congratulations for the job, and the clarity of the explanations! Reading the story, everything looks simple where it sure is not. Lot to learn inside, thanks to the author for that.

I once came across a game that rendered progress bar frames at fixed points during the loading process. With VSync enabled this would significantly slow things down compared to what a fast computer could manage otherwise.

Really great work. Here's to hoping someone at Rockstar notices. You didn't even need access to the original sourcecode to debug it! Quite impressive my friend. Thanks for the great write-up!

Let me guess, when they started this JSON was small and this parsing code worked fine for that small input but over time the JSON got bigger and bigger and here we are.

I'm going to make the valid assumption that someone, at some point, was assigned to fix this. The issue here, IMHO, is why they didn't. It comes down to how complicated it was to fix this from the organizational perspective, and not from the technical perspective. I'll explain.

First, a disclaimer: I have no idea how much Rockstar employees are paid, nor how their workdays look like. I don't know what their team sizes are, where they worked before, or who manages who and in what fashion. I actually don't know anything about the Rockstar engineering organization at all.

I am also not a GTA player (or, more accurately, haven't been since San Andreas came out many moons ago). This is my perspective as someone who has worked for various organizations in his life ( SWE-centered ones but also other, more "traditional" ones too).

We're all familiar with technical debt - it's a well established concept by now. Reducing it is part of the normal workload for well-functioning organizations, and (good) tech leads think about it often.

What isn't talked about as often is the "organizational" debt. Some things are "verboten" - you don't touch them, you don't talk about them, and you run like hell if you're ever assigned to deal with them.

Every large enough company has (at least) one of those. It might be a critical service written in a language nobody in the team knows anymore. Maybe it's a central piece of infra that somebody wrote and then left, and it's seen patch after patch ever since. These are things that end your career at the company if the BLAME points to you after you attempted to fix them.

I have a gut feeling - not based on anything concrete, as I mentioned - that the loading part for GTA Online might be one of those things. If someone breaks the process that loads the game - that's a big deal. No one would be able to play. Money would be lost, and business-folk will come knocking.

So sure, there might be some mitigations in place - if some part fails to load they allow the game to load anyways, and then attempt to fix it mid-fly. It's not black and white. But it feels like one of those things, and so people might have just been running like hell from it for years and years. Teams change. Projects change hands. People leave, more people join. It's life in the industry.

I would be REALLY interested in learning how software orgs deal with these types of behemoths in their projects. I have yet to find someone who knows how to - methodically and repetitively - break these apart when they appear.

  • > I would be REALLY interested in learning how software orgs deal with these types of behemoths in their projects. I have yet to find someone who knows how to - methodically and repetitively - break these apart when they appear.

    I ignore them until a big, big customer complains and threatens revenue, then I scream and scream until it gets fixed.

This should be tweeted to their official channel or someone inside company (on twitter) who has ability let relevant Rockstar dev know about it...

Really nice hacking; well done.

I wonder who t0st is. He lives in Moscow according to Twitter. This type of work is usually done to find vulnerability in code e.g. buffer overflows. Injecting a piece of code to program machine code is a technique used in system hacking or software piracy. I can bet t0st is working or has worked for FSB. Anyway, not that is matters... it is still fun to think about it

This article is great. As a developer, I've always been curious about what the heck GTAO was doing that could take so long to load

The author's frustration with GTA Online reminds me of my own experience with Borderlands 2 taking over 1 minute for its "searching for downloadable content" loading phase. I hate DLC, so having to wait for it was especially aggravating.

I did not ever consider inspecting obfuscated assembly and hooking functions to fix the problem myself. Very impressive work!

BTC mining is using a significant amount of global energy production every year now. The more cynical might say that ultimately it's a waste of human effort, but I have to say that this bug (and it's most definitely a bug) should be patched (and should've been patched) to save the kilowatts of power wasted parsing mostly the same data.

I can't believe what Rockstar did here. I and 3 friends stop playing GTA V cause of the unbelievable stupid loading times.

Micro transactions were probably added in late stages of development by someone pressured to get the job done quickly and the developers were savvy enough to believe that the added overhead to slow initial loading was standard fare that users already accepted. Not enough time to optimize, forced to ship, and good enough.

I remember when someone explained here eloquently and inticately explained why Twitter constantly breaks in various ways, such at saying I'm not allowed to see a tweet, but then I merely hit reload and it works... easy fix posted around a year ago at least but still not implemented. Don't hold your breath.

There's a decent chance that this is also just an issue within a JSON parsing library that someone at R* decided to throw in and use. The parsing library was probably never intended to handle such large blobs, and R* didn't want to mess with the the OSS code.

I wonder if it might even be an incompatible OSS license?

This is such an awesome article, and I love it. Thanks to the author for the great digging and explanation.

> Normally Luke would group the same functions together but since I don’t have debugging symbols I had to eyeball nearby addresses to guess if it’s the same place.

I really enjoyed this article, but I do have a question about this part. Why would a single function be listed at mutliple addresses?

  • Well it's not actually the function's start address, it's the instruction pointer's address for the top function (so it moves around while executing).

    And going down the call tree, it's also not the start address, but the return address - so the place where in the previous function called this one.

    Without debug symbols there's no way to tell if we're inside the same function block or not - it's all just addresses somewhere in the machine code.

  • I'd guess it's just using the value of the instruction pointer at each point it samples, and the way to resolve the function from that is to look backwards to find the symbol of the function it's in. As he has no symbols Luke has no (easy) way of knowing where functions start so it can't do this lookup.

He updated his article and added an option to donate him. Check the bottom of his website! ;)

Maybe someone already asked(but the comments are huge). Is there anyway to also trim down the JSON file? Either on disk, or when it is fetched from remote? If you don't care about transactions then you could remove it completely.

I think the developers knew it was slow, but they assumed that it was because 'parsing JSON is slow and we can't do anything about it'. Sometimes you're just too sure what's wrong.

I hate to be the guy but I have to. I don't know how to accomplish this but if the part in question was open source someone would have noticed.

I bet this feature, relating to in game purchasing options, was retro-fitted by some non-core dev team who delivered it to spec and fixed cost

Developers probably knew about this, but executives didn't care when getting int monetary calculations.

I'm a complete noob at all this, si there an easy way to try this out at all? This is an awesome find!

This is quite impressive though it is important to remember that this came out for machines like the Xbox360 and the PlayStation 3. I am not convinced these modifications would “fit” on those machines without significant thrashing/paging and significant memory usage to the point where other negative effects come into play. Great read and effort regardless.

I bought a PlayStation a few years go to play GTA..

Never tried consoles before, but figured the point was that it would be fast and hassle free.

GTA was fun, but after completing it I never load it, just to play for fun, because PS load times + GTA load times are too much.

Now I just play 20 year old games under wine on Linux.. that's less of a hassle, and I don't have to touch windows.

I find it interesting that, with ~63k entries, they didn't just use a bloom filter to do a lookup on if the entity they're working with has already been seen. granted, they stoll need to store the data, but I think a bloom filter would be a more effective way to test if the item exists already

Hopefully they completely ignored this and other issues in GTA5 online because they have been working on GTA6 all these years /end sarcasm

But honestly, on a large enough team with a large enough backlog much worse things slip through the cracks. Doesn't excuse this at all though.

Also kudos on the write up! Enjoyed reading it

I'm a zero to no knowledge on this in terms of implementation, is there like an easy execute at all?

Red Dead redemption 2 online also does this exact same thing the first time it did it I shut the game off and then charged back my credit card I dealt with it for GTA and I'm just not going to deal with it anymore.

think about it, how much electricity wasted

millions of players, billions of minutes wasted

when you scale things, every bit of optimizations IS MENDATORY!

#saveOurPlanet

tl;dr

* There’s a single thread CPU bottleneck while starting up GTA Online

* It turns out GTA struggles to parse a 10MB JSON file

* The JSON parser itself is poorly built / naive and

* After parsing there’s a slow item de-duplication routine

(Copied from the END of the article)

> the problems shouldn’t take more than a day for a single dev to solve

It bothers me that so many of us developers are annoyed by managers saying stupid stuff like "shouldn't be too hard" about things they don't understand, but then other developers walk into the same trap.

Yes, it looks simple at the surface. It most likely isn't. The problem is not that it looks simple, the problem is that you assume it is simple, probably because you don't have the full context.

  • The person posted a PoC that works. Surely they have context now?

    • That's very true, and the PoC works for that person. It's not that easy in real-life development though, you can't just switch out the JSON parser and call it a day. Lots of testing has to be done and you have to go through all the previous changes to make sure you're not missing what the previous maintainers did to fix some one-in-a-million issues.

      I'm not saying it's impossible for this to be as easy as the author claims it to be. I'm just saying that it might not actually be that easy in reality, if you're on the inside.

      7 replies →

  • I agree "shouldn't be too hard" should be avoided, but in this case the developers should fix it even if it is hard.

    • Back of the envelope (and if the same problem applies to consoles in addition to PC), the fix would save 30+ human waking lifetimes per year.