← Back to context

Comment by kccqzy

5 days ago

Because in the real world, for code where performance is needed, you run the profiler and either find that the time is spent on I/O, or that the time is spent inside native code.

This might have been your experience, but mine has been very different. In my experience a typical python workload is 50% importing python libraries, 45% slow python wrapper logic and 5% fast native code. I spend a lot of time rewriting the python logic in C++, which makes it 100x faster, so the resulting performance approaches "10% fast native logic, 90% useless python imports".

  • There is more than one PEP related to making imports faster such as PEP 690 or PEP 810. It's definitely a well-known problem. The solution is probably right around the corner.

  • Imports being slow is annoying, but only matters to short running code.

    • Many simple scripts at my work that more or less just argparse and fire off an HTTP request spend half a minute importing random stuff because of false deps and uncommon codepaths. For some unit tests it's 45 seconds, substantially longer than the time taken to run the test logic.

      In dev cycles most code is short-running.

      3 replies →

  • If imports are slow, you need to not be writing python in the first place, because you are either on limited hardware or you are writing a very performant app.

I do a bit of performance work and find most often that things are mixed: there’s enough CPU between syscalls that the hardware isn’t being full maximized, but there’s enough I/O the CPUs aren’t pegged either. It is rare that the profiler finds an obvious hotspot that yields an easy win; usually it shows that with heavy refactoring you can make 10% of your load several times faster, and then you’ll need to do the same for the next 10% and so on. That is the more typical real world for me, and in that world Python is really awful when compared to rewrite-it-in-Rust.

  • This "There are no hot spots, it's just a uniform glowing orange" situation is why Google picked C++ and then later Rust and to some extent why they picked Go too.

When it's a drop-in replacement, as in most of my code (and it's dead simple to try if it runs when you use pypy ./main.py), I wouldn't know why you should run the code 5-50% slower for no reason though

IRL you will have CPU-bottlenecked pure Python code too. But it's not enough to take on the unknown risk of switching to a lesser supported interpreter. Worst case you just put in the effort to convert the hot parts to multiprocessing.

Also, that engineer time you would spend optimizing for performance costs more than just throwing more hardware at it.

  • For cloud jobs that can be true, but for single threaded dev-in-the-loop work you can't just buy a 100x faster processor than the one on their dev machine, and the latency is expensive workflow friction.

  • Not if you have certain types of scientific data. You can't rent enough hardware to run the slow code.