Comment by ncruces
5 days ago
> Or writing some data science stuff, it might be like: load data from disk or network for 6 seconds, run Numpy on it for 3 hours, write it back out for 3 seconds.
> You could rewrite that in Rust and it wouldn't be any faster.
I was asked to rewrite some NumPy image processing in C++, because NumPy worked fine for 1024px test images but balked when given 40 Mpx photos.
I cut the runtime by an order of magnitude for those large images, even before I added a bit of SIMD (just to handle one RGBX-float pixel at a time, nothing even remotely fancy).
The “NumPy has uber fast kernels that you can't beat” mentality leads people to use algorithms that do N passes over N intermediate buffers, that can all easily be replaced by a single C/C++/Rust (even Go!) loop over pixels.
Also reinforced by “you can never loop over pixels in Python - that's horribly slow!”
Same with opencv and even sometimes optimized matrix libraries in pure C++. These are all highly optimized. But often when you want to achieve something you have to chain stuff which quickly eats up a lot of cycles, just by copying stuff around and having multiple passes that the compiler is unable to fuse. You can often pretty easily beat that even if you are not an optimization god by manual loop fusion.
Fused expressions are possible using other libraries (numexpr is pretty good), but I agree that there's a reluctance to use things outside of NumPy.
Personally though I find it easier to just drop into C extensions at the point that NumPy becomes a limiting factor. They're so easy to do and it lets me keep the Python usability.