Comment by ModernMech

5 days ago

I'm thankful they included a compiled language for comparison, because most of the time when I see Python benchmarks, they measure against other versions of Python. But "fast python" is an oxymoron and 3.14 doesn't seem to really change that, which I feel most people expected given the language hasn't fundamentally changed.

This isn't a bad thing; I don't think Python has to be or should be the fastest language in the world. But it's interesting to me seeing Python getting adopted for a purpose it wasn't suited for (high performance AI computing). Given how slow it is, people seem to think there's a lot of room for performance improvements. Take this line for instance:

> The free-threading interpreter disables the global interpreter lock (GIL), a change that promises to unlock great speed gains in multi-threaded applications.

No, not really. I mean, yeah you might get some speed gains, but the chart shows us if you want "great" speed gains you have two options: 1) JIT compile which gets you an order of magnitude faster or 2) switch to a static compiled language which gets you two orders of magnitude faster.

But there doesn't seem to be a world where they can tinker with the GIL or optimize python such that you'll approach JIT or compiled perf. If perf is a top priority, Python is not the language for you. And this is important because if they change Python to be a language that's faster to execute, they'll probably have to shift it away from what people like about it -- that it's a dynamic, interpreted language good for prototyping and gluing systems together.

I've been writing Python professionally for a couple of decades, and there've only been 2-3 times where its performance actually mattered. When writing a Flask API, the timing usually looks like: process the request for .1ms, make a DB call for 300ms, generate a response for .1ms. Or writing some data science stuff, it might be like: load data from disk or network for 6 seconds, run Numpy on it for 3 hours, write it back out for 3 seconds.

You could rewrite that in Rust and it wouldn't be any faster. In fact, a huge chunk of the common CPU-expensive stuff is already a thin wrapper around C or Rust, etc. Yeah, it'd be really cool if Python itself were faster. I'd enjoy that! It'd be nice to unlock even more things that were practical to run directly in Python code instead of swapping in a native code backend to do the heavy lifting! And yet, in practice, its speed has almost never been an issue for me or my employers.

BTW, I usually do the Advent of Code in Python. Sometimes I've rewritten my solution in Rust or whatever just for comparison's sake. In almost all cases, choice of algorithm is vastly more important than choice of language, where you might have:

* Naive Python algorithm: 43 quadrillion years

* Optimal Python algorithm: 8 seconds

* Rust equivalent: 2 seconds

Faster's better, but the code pattern is a lot more important than the specific implementation.

  • > Or writing some data science stuff, it might be like: load data from disk or network for 6 seconds, run Numpy on it for 3 hours, write it back out for 3 seconds.

    > You could rewrite that in Rust and it wouldn't be any faster.

    I was asked to rewrite some NumPy image processing in C++, because NumPy worked fine for 1024px test images but balked when given 40 Mpx photos.

    I cut the runtime by an order of magnitude for those large images, even before I added a bit of SIMD (just to handle one RGBX-float pixel at a time, nothing even remotely fancy).

    The “NumPy has uber fast kernels that you can't beat” mentality leads people to use algorithms that do N passes over N intermediate buffers, that can all easily be replaced by a single C/C++/Rust (even Go!) loop over pixels.

    Also reinforced by “you can never loop over pixels in Python - that's horribly slow!”

    • Same with opencv and even sometimes optimized matrix libraries in pure C++. These are all highly optimized. But often when you want to achieve something you have to chain stuff which quickly eats up a lot of cycles, just by copying stuff around and having multiple passes that the compiler is unable to fuse. You can often pretty easily beat that even if you are not an optimization god by manual loop fusion.

    • Fused expressions are possible using other libraries (numexpr is pretty good), but I agree that there's a reluctance to use things outside of NumPy.

      Personally though I find it easier to just drop into C extensions at the point that NumPy becomes a limiting factor. They're so easy to do and it lets me keep the Python usability.

  • That's because you're doing web stuff. (I/O limited). So much of our computing experience has been degraded due to this mindset applied more broadly. Despite a steady improvement in hardware, my computing experiences have been stagnating and degraded in terms of latency, responsiveness etc.

    I'm not going to even go into the comp chem simulations I've been running, or that about 1/3 the stuff I do is embedded.

    I do still use python for web dev, partly because as you say, it's not CPU-bound, and partly because Python's Django framework is amazing. But I have switched to rust for everything else.

    • As a java backend dev mainly working on web services, I wanted to like python, but I have found it really hard to work on a large python project because the auto complete just does not work as well as something like java.

      Maybe it is just due to not being as familiar with how to properly setup a python project, but every time I have had to do something in a django or fast api project it is a mess of missing types.

      How do you handle that with modern python? Or is it just a limitation of the language itself?

      3 replies →

    • I won’t completely argue against that, and I’ve also adopted Rust for smaller or faster work. Still, I contend that a freaking enormous portion of computing workloads are IO bound to the point that even Python’s speed is Good Enough in an Amdahl’s Law kind of way.

      5 replies →

    • > That's because you're doing web stuff.

      I guess you didn't notice where he talked about running numpy?

    • And 300ms for a DB call is slow, in any case. We really shouldn't accept that as normal cost of doing business. 300ms is only acceptable if we are doing scrypt type of things.

      6 replies →

  • Sure then you get a developer who decides to go with Flask for an embedded product and it's an eye watering slog.

    • People will always make bad decisions. For example, I'd also squint at a developer who wanted to write a new non-performance-critical network service in C. Or a performance-critical one, for that matter, unless there was some overwhelming reason they couldn't use Rust or even C++.

  • Advent of code is deliberately set up to be doable in Python. You can also imagine a useful problem which Rust takes 2 weeks to do, how long would it take in Python?

  • And my experience is this: you start using ORMs, and maybe you need to format a large table once in a while. Then your Python just dies. Bonus points if you're using async to service multiple clients with the same interpreter.

    And you're now forced to spend time hunting down places for micro-optimizations. Or worse, you end up with a weird mix of Cython and Python that can only be compiled on the developer's machine.

  • LOL, python is plenty fast if you make sure it calls C or Rust behind the scenes. Typical of 'professional' python people. Something too slow? just drop into C. It surely sounds weird to everyone who complains about Python being slow and the response is on these lines.

    • But that’s the whole point of it. You have the option to get that speed when it really matters, but can use the easier dynamic features for the very, very many use cases where that’s appropriate.

      This is an eternal conversation. Years ago, it was assembler programmers laughing at inefficient C code, and C programmers replying that sometimes they don’t need that level of speed and control.

      1 reply →

    • People really misconstrue the relationship between Python and C/C++ in these discussions.

      Those libraries didn't spring out of thin air, nor were they ever existing.

      People wanted to write and interface in python badly, that's why you have all these libraries with substantial code in another language yet research and development didn't just shift to that language.

      TensorFlow is a C++ library with a python wrapping. Pytorch has supported C++ interface for some time now, yet virtually nobody actually uses tensorflow or pytorch in C++ for ML R&D.

      If python was fast enough, most would be fine, probably even happy to ditch the C++ backends and have everything in python, but the reverse isn't true. The C++ interface exists, and no-one is using it. C++ is the replaceable part of this equation. Nobody would really care if Rust was used instead.

    • Even as a Fortran programmer, the majority of my flops come from BLAS, LAPACK, and those sort of libraries… putting me in the exact same boat as the Python programmers, really. The “professional” programmers in general don’t worry too much about tying their identities to language choices, I think.

    • This is a very common pattern in high level languages and has been a thing ever since Perl had first come onto the scene. The whole point was that you use more ergonomic, easier to iterate languages like Perl or Python for most of your logic and you drop down into C, C++, Zig, or Rust to write the performance sensitive portions of your code.

      When compiled languages became popular again in the 2010s there was a renewed effort into ergonomic compiled languages to buck this trend (Scala, Kotlin, Go, Rust, and Zig all gained their popularity in this timeframe) but there's still a lot of code written with the two language pattern.

    • This assumes the boundary between Python and the native code is clean and rarely crossed.

  • Exactly, most Python devs neither need nor care about perf. Most applications don't even need perf, because whether it's .1 second or .001 seconds, the user is not going to notice.

    But this current quest to make Python faster is precisely because the sluggishness is noticeable for the task it's being used for most at the moment. That 6 second difference you note between the Optimal Python and the optimal Rust is money on the table if it translates to higher hardware requirements or more server time. When everything is optimal and you could still be 4x faster, that's a tough pill to swallow if it means spending more $$$.

    • > most Python devs neither need nor care about perf.

      You do understand that's a different but equivalent way of saying, "If you care about performance, then Python is not the language for you.", don't you?

      10 replies →

It's pretty simple. Nobody wants to do ML R&D in C++.

Tensorflow is a C++ library with python bindings. Pytorch has supported a C++ interface for some time now, yet virtually nobody uses C++ for ML R&D.

The relationship between Python and C/C++ is the inverse of the usual backend/wrapper cases. C++ is the replaceable part of the equation. It's a means to an end. It's just there because python isn't fast enough. Nobody would really care if some other high perf language took its place.

Speed is important, but C++ is even less suited for ML R&D.

  • I think readability is what made python a winner. I can quickly type down my idea like a pseudo code, I can easily skim through other people’s algos. In C++ even a simple algo with a 100 lines of pseudo code will balloon to thousands of lines in c++

I agree. Unless they make it like 10x faster it doesn't really change anything. It's still a language you only use if you absolutely don't care whatsoever about performance and can guarantee that you never will.

  • Well, that's not true at all. Scientists care about performance, but it turns out that Python is really good for number crunching since it is really good for using very fast C libraries. I know people who use pandas to manipulate huge datasets from radar astronomy. Also, of course, it's used in machine learning. If Python was "only" used in situations where you don't care about performance, it would not be used in so many scenarios that definitely need high performance. Sure, it is not pure Python, but it's still Python being used, just used to orchestrate C libraries

  • If you’re actually building and shipping software as a business Python is great. The advantages of Python for a startup are many. Large pool of talent that can pickup the codebase on essentially day 1. Fairly easy to reason about, mature, code velocity, typically one and only one way to do things as opposed to JavaScript. There is way more to the story than raw performance.

    • It's not that great when you see that the majority of the Python code in businesses is a totally unmaintainable mess because it has incorrect, partial, or no type annotations, and is littered with serious errors that a most basic type checker would flag.

    • > The advantages of Python for a startup are many. Large pool of talent that can pickup the codebase on essentially day 1.

      Large pool of mediocre Python developers that can barely string a function together in my experience.

  • The counterargument used to be, the heavy lifting will be offloaded to python modules written in C, like numpy.

    Which was true, but maybe not the strongest argument. Why not use a faster language in the first place?

    But it's different now. There's huge classes of problems where pytorch, jax &co. are the only options that don't suck.

    Good luck competing with python code that uses them on performance.

    • > Why not use a faster language in the first place?

      Well for the obvious reason that there isn't really anything like a Jupyter notebook for C. I can interactively manipulate and display huge datasets in Python, and without having to buy a Matlab license. That's why Python took off in this area, really

      7 replies →

    • >Which was true, but maybe not the strongest argument. Why not use a faster language in the first place?

      Because most faster languages sucks donkeys balls when it comes to using them quickly and without ceremony. Never mind trying to teach non-programmers (e.g. physics, statistics, etc people) them...

  • >>> you absolutely don't care whatsoever about performance and can guarantee that you never will.

    Those are actually pretty good bets, better than most other technological and business assumptions made during projects. After all, a high percentage of projects, perhaps 95%, are either short term or fail outright.

    And in my own case, anything I write that is in the 5% is certain to be rewritten from scratch by the coding team, in their preferred language.

    • Sure but you're still screwing yourself over on that 5% and for no real reason - there are plenty of languages that are just as good as Python (or better!) but aren't as hilariously slow.

      And in my experience rewrites are astonishingly rare. That's why Dropbox uses Python and Facebook uses PHP.

  • Obtuse statement. There are many ways of speeding up a python project if requirements change.

    • A painful rewrite in another language is usually the only option in my experience.

      If you're really lucky you have a small hot part of the code and can move just that to another language (a la Pandas, Pytorch, etc.). But that's usually only the case for numerical computing. Most Python code has its slowness distributed over the entire codebase.

      3 replies →

  • Probably people at some point were making same arguments about ASM and C. How many people though do ASM these days? Not arguing that for now it is relevant point, obviously Rust / C are way faster.

    • I doubt it. C is well within 2x of what you can achieve with hand written assembly in almost every case.

      Furthermore writing large programs in pure assembly is not really feasible, but writing large programs in C++, Go, Rust, Java, C#, Typescript, etc. is totally feasible.

      3 replies →