← Back to context

Comment by rossdavidh

8 years ago

I have to say that my first reaction was: "maybe you shouldn't use python for this, then". If you are using a language in a way that it gets worse in subsequent versions, that's a good sign that they're optimizing for something other than what you care about.

The programming language R does not, as I understand it, optimize for speed, because they are optimizing for ease of exploratory data analysis. R is growing quite rapidly. So is python, actually. It doesn't mean that either one is good at everything, and it's probably the case that both are growing because they don't try to be good at everything. A good toolbox is better than a multi-tool.

(I authored the linked post)

While the "maybe you shouldn't use Python" comment could be construed as trolling to some, there is definite truth to your line of reasoning and I agree with comment.

I absolutely love Python as a programming language for the space it is in. But as someone who needs to think long term about maintaining large projects with lifetimes measured in potentially decades, Python has a few key weaknesses that make it really difficult for me to continue justify using it for such projects. Startup time is one. The GIL is the other large one (not being able to achieve linear speedups on CPU-bound code in 2018 with Moore's Law dead is unacceptable). General performance disadvantages can be adequately addressed with PyPy, JITs, Cython, etc. Problems scaling large code bases using a dynamic language can be mitigated with typing and better tools.

Python can be very competitive against typed systems languages. But if it fails to address its shortcomings, I think more and more people will choose Rust, Go, Java, C/C++, etc for large scale, long time horizon projects. This will [further] relegate Python to be viewed as a "toy" language by more serious developers, which is obviously not good for the Python ecosystem. So I think "maybe you shouldn't use Python for this, then" is a very accurate statement/critique.

  • I would characterize Python's weaknesses differently.

    Startup time is a problem for Python. But concurrency is much more complex than you state: threading is not the only or best concurrency model for many applications. And certainly removing the GIL will not just enable Python "to achieve linear speedups on CPU-bound code". Distributed computing is real. One of Python's problems for a long time was not the GIL, it was the sorry state of multi-process concurrency.

    The speed issues that JITs solve for other languages may not be solvable in Python due to language design.

    • I'm totally OK with Python's threading choice of saying only 1 Python thread may execute Python code at any time. This is a totally reasonable choice and avoids a lot of complexity with multithreaded programming. If that's how they want to design the language, fine by me.

      But the GIL is more than that: the GIL also spans interpreters (that's why it's called the "global interpreter lock").

      It is possible to run multiple Python interpreters in a single process (when using the embedding/C API). However, the GIL must be acquired for each interpreter to run Python code. This means that I can only effectively use a single CPU core from a single process with the GIL held (ignoring C extensions that release the GIL). This effectively forces the concurrency model to be multiple process. That makes IPC (usually serialization/deserialization) the bottleneck for many workloads.

      If the GIL didn't exist, it would be possible to run multiple, independent Python interpreters in the same process. Processes would be able to fan out to multiple CPU cores. I imagine some enterprising people would then devise a way to transfer objects between interpreters (probably under very well-defined scenarios). This would allow a Python application to spawn a new Python interpreter from within Python, task it with running some CPU-expensive code, and return a result. This is how Python would likely achieve highly concurrent execution within processes. But the GIL stands in its way.

      The GIL is an implementation detail, not poor language design.

      5 replies →

  • Python's slowness can help improve performance by teaching you to use techniques that end up being faster no matter the language.

    Python is so slow that it forces you to be fast.

    Consider data analysis: on modern machines, you're almost always better off with a columnar approach: if you have a struct foo { int a, b, c; }, you want to store int foo_a[], foo_b[], foo_c[], not struct foo data[]. It's better for the cache, better for IO, and better for SIMD.

    numpy makes it much easier to use the latter than the former, whereas in C, you might be tempted with the former and not even realize how much performance you were leaving on the table. Likewise for GPU compute offloading, reliance on various tuned libraries for computationally intensive tasks, and the use of structured storage.

  • Sorry, I didn't mean it to be trolling, I just meant it more or less literally. If Rust (for example) gets used for things like Mercurial and Mozilla, is that bad? I'm not saying Python shouldn't care, if it could improve the startup time without sacrificing other things. But presumably the transition from py2 to py3 was not intending to make things slower, it was intending to solve other problems. There are almost always tradeoffs. Even the mercurial folks quoted in the article said that the things py3 solved were not what they needed. That's a good indicator that Python is not the right language (anymore) for what they're doing.

    I am primarily a Python programmer, but if Rust, Go, etc. take over as the language of choice in certain cases, I don't think that's a bad thing. Which doesn't mean one shouldn't write an article to highlight this cost of not having short startup time, just in case this cost wasn't understood by Guido, et al. But my guess (and it's only a guess), is that it was.

  • > While the "maybe you shouldn't use Python" comment could be construed as trolling to some, there is definite truth to your line of reasoning and I agree with comment.

    I wouldn't say I construed it as trolling. More like, "You might be right, but where does that get us?" Not trolling, but also not that constructive, because it's extremely easy to write something like "maybe you shouldn't use Python" but likely hard and time-consuming to make it so.

    There are a lot of questions when considering such a move. For example:

    - What's the opportunity cost of migrating $lots_of Python to Rust, or some other language?

    - Is that really where you can add (or want to add) the most value?

    - And what does having to do that do to your roadmap? Maybe it enables it, but surely it's also stealing time from other valuable work you could be doing?

    - Longer term, are we sacrificing maintainability for performance? (In your case it sounds like the opposite?)

    - How easily can we hire and onboard people using $new_tech? (Again, it sounds like you might reduce complexity.)

    Basically I suppose what I'm saying is I find it a little trite when people say, "well, maybe you should do X," without having weighed the costs and benefits of doing so. And in a professional environment, if that's allowed to become a pattern of behaviour, it can contribute to the demotivation of teams. Hence, I found myself a bit irritated by the grandparent post.

    • Python was always slow to start. Not as slow as the JVM, but maybe around the 300th test case for hg and maybe around the 100th python script invocation in any build system, people should start to wonder about how to get all of that under one Python process.

      It's not like Python is so ugly it'd be messy to do. (It was possible with the JVM after all. It even works by simply forking the JVM, with all its GC threads and so on: https://github.com/spray/sbt-revolver )

      Make style DAGs are nice, but eventually the sheer number of syscalls for process setup (and module import and dynamic linking) are going to be a waste of time.

  • I think your characterization of the GIL is not accurate. Show me ANY real world program that can achieve linear speedups on multicore or multi-processor systems. Humans have not sufficiently mastered multithreading to be able to make such a claim. I am not aware of any "CPU-bound" use cases that would actually use Python like this instead of, say, C or Fortran. And anyway, I submit that it would benefit (both from a design and an execution standpoint) from being multi-process (in other words, using explicitly coded communication).

  • Regarding the GIL I‘ve always wondered about Jython but never gotten around to trying it. What are the drawbacks of running it on a JVM to get true multithreading? Having to properly sync the threads like in other environments without global locks?

    • Nothing, it's just not maintained. People realized, that yeah, python is nice, but why spend years reimplementing it on the JVM, when there's Kotlin. (And Java itself is quite a breeze to program in nowadays. And of course Scala, if you dare go beyond the Pythonic simplicity.)

      1 reply →

The supposed attitude of the python developers about startup time works against the popular niches Python is supposed to be such a great fit for. Little scripts, glue, short run applications.

That’s a problem if that’s an area python wants to compete in.

  • I might be biased because I'm from the hordes that are moving from Stata and Matlab to Python (but then there are the hordes attracted to data analysis now), but that was never really Python's strong suit, nor its target market.

    I mean, I was always into little scripts, but I used Tcl and then Perl.

    • Back in the 1990s, Python was promoted as a web programming language. This was back in the days when everyone used CGIs. Python came with an cgi module, while in Perl you had to download cgi-lib.pl. I even helped maintain a Python web application that was all CGI-based.

      So I can assure you that at one point Python was trying to be in the "short run applications" space. They may have given up since then, but that's a different issue.

      As for me, I do write little scripts in Python. I don't like how most of my run time is spent waiting for Python to get ready.

      What I really don't like is using NumPy. I tend to re-implement features I want rather than reach for NumPy because that 0.2s import time irks me so much. And it's because the NumPy developers want people to do "import numpy; numpy.reach.into.a.deep.package", so they import most of its submodules.

      They used to also eval() some code at import, causing even more overhead. I don't know if that's gone away.

      2 replies →

    • Both Tcl and Perl are dead languages walking these days, and it's Python that's displaced them. It absolutely competes in that market.

      7 replies →

  • The linked post is about Python startup being a problem with thousands of invocations. Is Python startup really a problem for the niches you mention, or is it a problem in some extreme edge cases? I would argue this is the latter and perhaps signals that an architecture change for the build or tests would be best.

    I have been using Python for small scripts for 20+ years and haven't had this issue. The JVM on the other hand was historically slow to start.

    • If you need to run thousands of scripts, do you need to (re-)start Python for each script? IMHO what needs to be done for this problem is not faster startup, but a way to avoid startup by implementing a feature where you can keep a single Python "machine" in memory that can make a "soft reset" to execute a fresh script.

      1 reply →

  • Yep. Tried to use a Raspberry Pi as my main system for a while and one of the pain points was slooooow startup of Python. As a Python fan I was embarrassed.

  • I don't particularly agreed about this being what "Python is supposed to be such a great fit for."

    I've been to quite a few PyCons and never heard anyone espousing this view, but I'm open to the possibility that I have missed it. Can you link me to a piece of media that you think persuasively makes the case that this is what Python is supposed to be for?

  • Python is not optimized for small glue code at all. The fact that it is the sanest language for use in that niche speaks much more about the ecosystem than about Python.

    Python seems to be mainly optimized for web servers, scientific computing and machine learning tasks. None of those care about startup time.

  • Python is really only the target for those because someone lied to all of the systems folk and told them that Ruby was too slow. (The previous wave of infrastructure management tools seemed to all be written in Ruby and nowadays it's Python or Go.) That and python is one of the "official" languages at Google and everyone wants to be Google, right?

    Meanwhile, Ruby is making great strides in performance and even has JIT coming in 2.6.