← Back to context

Comment by dan-robertson

5 years ago

I don’t think the lesson here is “be careful when parsing json” so much as it’s “stop writing quadratic code.” The json quadratic algorithm was subtle. I think most people’s mental model of sscanf is that it would be linear in the number of bytes it scans, not that it would be linear in the length of the input. With smaller test data this may have been harder to catch. The linear search was also an example of bad quadratic code that works fine for small inputs.

Some useful lessons might be:

- try to make test more like prod.

- actually measure performance and try to improve it

- it’s very easy to write accidentally quadratic code and the canonical example is this sort of triangular computation where you do some linear amount of work processing all the finished or remaining items on each item you process.

As I read the article, my guess was that it was some terrible synchronisation bug (eg download a bit of data -> hand off to two sub tasks in parallel -> each tries to take out the same lock on something (eg some shared data or worse, a hash bucket but your hash function is really bad so collisions are frequent) -> one process takes a while doing something, the other doesn’t take long but more data can’t be downloaded until it’s done -> the slow process consistently wins the race on some machines -> downloads get blocked and only 1 cpu is being used)

> actually measure performance and try to improve it

This really rings truest to me: I find it hard to believe nobody ever plays their own game but I’d easily believe that the internal culture doesn’t encourage anyone to do something about it. It’s pretty easy to imagine a hostile dev-QA relationship or management keeping everyone busy enough that it’s been in the backlog since it’s not causing crashes. After all, if you cut “overhead” enough you might turn a $1B game into a $1.5B one, right?

  • Lots of possibilities. Another one I imagined is that "only the senior devs know how to use a profiler, and they're stuck in meetings all the time."

    • I could easily imagine variations of that where this is in maintenance mode with a couple of junior programmers because the senior ones either burnt out or moved on to another project. I’ve definitely gotten the impression that most games studios have roughly the same attitude towards their employees as a strip-miner has towards an Appalachian hilltop.

      2 replies →

- do not implement your own JSON parser (I mean, really?).

- if you do write a parser, do not use scanf (which is complex and subtle) for parsing, write a plain loop that dispatches on characters in a switch. But really, don't.

  • I think sscanf is subtle because when you think about what it does (for a given format string), it’s reasonably straightforward. The code in question did sscanf("%d", ...), which you read as “parse the digits at the start of the string into a number,” which is obviously linear. The subtlety is that sscanf doesn’t do what you expect. I think that “don’t use library functions that don’t do what you expect” is impossible advice.

    I don’t use my own json parser but I nearly do. If this were some custom format rather than json and the parser still used sscanf, the bug would still happen. So I think json is somewhat orthogonal to the matter.

    • > The code in question did sscanf("%d", ...), which you read as “parse the digits at the start of the string into a number,” which is obviously linear.

      I think part of the problem is that scanf has a very broad API and many features via its format string argument. I assume that's where the slowdown comes from here - scanf needs to implement a ton of features, some of which need the input length, and the implementor expected it to be run on short strings.

      > The subtlety is that sscanf doesn’t do what you expect. I think that “don’t use library functions that don’t do what you expect” is impossible advice.

      I don't know, at face value it seems reasonable to expect programmers to carefully check whether the library function they use does what they want it to do? How would you otherwise ever be sure what your program does?

      There might be an issue that scanf doesn't document it's performance well. But using a more appropriate and tighter function (atoi?) would have avoided the issue as well.

      Or, you know, don't implement your own parser. JSON is deceptively simple, but there's still enough subtlety to screw things up, qed.

      4 replies →

    • > If this were some custom format rather than json and the parser still used sscanf, the bug would still happen. So I think json is somewhat orthogonal to the matter.

      What's the point of using standard formats if you're not taking advantage of off-the-shelf software for handling it?

  • This is probably good advice but not even relevant. It's down one level from the real problem: when your game spends 6 minutes on a loading screen, *profile* the process first. You can't optimize what you haven't measured. Now, if you've identified that JSON parsing is slow, you can start worrying about how to fix that (which, obviously, should be "find and use a performant and well-tested library".)

Is there some reason sscanf can not be implemented without calling strlen?

> actually measure performance and try to improve it

This reminds me that I used to do that all the time when programming with Matlab. I have stopped investigating performance bottlenecks after switching to Python. It is as if I traded performance profiling with unit testing in my switch from Matlab to Python.

I wonder if there are performance profilers which I could easily plug into PyCharm to do what I used to do with Matlab's default IDE (with a built-in profiler) and catch up with good programming practices. Or maybe PyCharm does that already and I was not curious enough to investigate.

The JSON parsing is forgivable (I actually didn't know that scanf computed the length of the string for every call) but the deduplication code is a lot less so, especially in C++ where maps are available in the STL.

It also comforts me into my decision of never using scanf, instead preferring manual parsing with strtok_r and strtol and friends. It's just not robust and flexible enough.

I thought the lesson is "listen to your customers and fix the issues they complain about".