Comment by jerf

5 months ago

You can draw out a sort of performance hierachy, from fastest to slowest:

    * Optimized GPU code
    * CPU vectorized code
    * Static CPU unvectorized code
    * Dynamic CPU code

where the last one refers to the fact that a language like Python, in order to add two numbers together in its native, pure-Python mode, does a lot of boxing, unboxing, resolving of class types and checking for overrides, etc.

Each of those is at least an order of magnitude slower than the next one up the hierarchy, and most of them appreciably more than one. You're probably closer to think of them as more like 1.5 orders of magnitude as a sort of back-of-the-envelope understanding.

Using NumPy incorrectly can accidentally take you from the top one, all the way to the bottom one, in one fell swoop. That can be a big deal, real quick. Or real slow, as the case may be.

In more complicated scenarios, it matters how much computation is going how far down that hierarchy. If by "processing a video frame by frame" you mean something like "I wrote a for loop on the frames but all the math is still in NumPy", you've taken "iterating on frames" from the top to the bottom, but who cares, Python can iterate on even a million things plenty quickly, especially with everything else that is going on. If, by constrast, you mean that at some point you're iterating over each pixel in pure Python, you just fell all the way down that hierarchy for each pixel and you're in bigger trouble.

In my opinionated opinion, the trouble isn't so much that it's possible to fall down that stack. That is arguably a feature, after all; surely we should have the capability of doing that sort of thing if we want. The problem is how easy it is to do without realizing it, just by using Python in what looks like perfectly sensible ways. If you aren't a systems engineer it can be hard to tell you've fallen, and even if you are honestly the docs don't make it particularly easy to figure out.

5 comments

jerf

coolcase 5 months ago

Plus it isn't a checkbox on a UI where Electon being 1000 times slower (1ms instead of 1micro) would be noticeable.

It could be a 12 hour run vs. 12000000 hour run.

HdS84 5 months ago

As a simple example : once upon a time a We needed to generate a sort of heat map. Doing it in pure python takes a few seconds at the desired size (few thousand cells where each cell needs a small formula). Dropping to numpy braucht that downs to hundreds of milliseconds. Pushing it to pure c got us to tens of milliseconds.

DrFalkyn 5 months ago

Yeah one of other beauties of numpy is you can pass data to/from native shared libraries compiled from C code with little overhead. This was more klidgy in Matlab last I checked

maximilianroos 5 months ago

that's a great hierarchy!

though what does "static cpu" vs "dynamic cpu" mean? it's one thing to be pointer chasing and missing the cache like OCaml can, it's another to be running a full interpreter loop to add two numbers like python does

jerf 5 months ago

That's what it means, basically. I draw a distinction between static code like C++ or Rust may generate and code like what Python may generate.
There is a middle ground of languages that box everything, but lack the rich complexity of Python or Ruby, such as Erlang, and I believe, O'Caml if you aren't semi-carefully programming for performance, that fits fairly cleanly into the middle ground between them. However compared to the uptake of static languages on the one side and the full dynamic scripting languages on the other, these are relatively speaking less common and don't get their own separate "in between" tier in my head, as it would end up being a .75-ish tier that would break the pattern. That is not to say they are bad or uninteresting, there's plenty of interesting languages there, they just aren't as popular.