I've always been disappointed by how large software projects, both FOSS and commercial, lose their "can do" spirit with age. Long-time contributors become very quick with a "no". They dismiss longstanding problems as illegitimate use cases and reject patches with vague and impervious arguments about "maintainability" or "complexity". Maybe in some specific cases these concerns might be justified, but when everything garners this reaction, the overall effect is that progress stalls, crystallized at the moment the last bit of technical boldness flowed away.
You can see this attitude of "no" on this very HN thread. Read the comments! Instead of talking about ways we can make Python startup faster, we're seeing arguments that Python shouldn't be fast, we shouldn't try to make it faster, and that programs (and, by implication, programmers) who want Python startup to be fast are somehow illegitimate. It's a dismal perspective. We should be exercising our creativity as a way to solve problems, not finding creative ways to convince ourselves to accept mediocrity.
This isn't an attitude of "no" - it's an attitude of "yes" to other things. The arguments are that making Python startup fast makes other things worse, and we care about those other things.
- Spend resources on developing PyPy, which (being a JIT) has relatively slow startup but much faster performance in general, for people who want fast performance.
- Write compilers from well-typed Python to native code.
- Keep CPython easy to hack on, so that more people with a "can do" spirit can successfully contribute to CPython instead of it being a mess of special cases in Guido's head.
Will you join me in saying "yes" to these things and not convincing ourselves to accept mediocrity?
I have to note that none of the projects you suggested, all of which are good and useful, will do anything to address cpython startup latency problem under discussion. Why shouldn't cypthon be better?
There's also no reason to believe that startup improvements would make the interpreter incomprehensible; the unstated assumption that improvements in this area must hurt hackability is interesting. IME, optimizations frequently boost both simplicity and performance, usually by unifying disparate code paths and making logic orthogonal.
MicroPython put together a Python in 250kb. Why the hell can't we make an LLVM frontend for Python that can use type hints for optimization? Sure, you lose some dynamic features as you optimize for speed, but that's the dream. Quickly write a prototype, not caring about types, optimize later with adding types and removing dynamicism.
I'm currently learning Racket and LLVM and I have about 70 more years to live. I'm gonna try make Python fast on slow weekends 'til I die.
> PyPy, which (being a JIT) has relatively slow startup
> time pypy -c 'print "Hello World"'
Hello World
pypy -c 'print "Hello World"' 0.08s user 0.04s system 96%
cpu 0.120 total
> time luajit -e 'io.write("Hello World!\n")'
Hello World!
luajit -e 'io.write("Hello World!\n")' 0.00s user 0.00s
system 0% cpu 0.002 total
None of these address, for instance, the issue raised about the firefox build invoking python many times. This seems both an accepted use case of cpython and an area where traditionally cpython has a huge edge on the JVM and PyPy. If scripts are not a priority, what is the expected use case of cpython?
I would like to note, the cpython ties to the PyObject C abi seem to stymie rather than encourage “hacking”. Cpython seems to have traditionally valued stability over all else.... see the issues pypy has had chasing compatibility with c and retaining speed.
So: normally i’m with you and a language should lean into its strengths, but i’ve always listed startup time as a primary strength of python!
Optimization is very, very rarely completely „free“ - and usually a concious trade of some property for another trait that‘s deemed more important in a specific case.
Simplicity for performance. Code size for compilation speed. Startup time for architectural complexity. UX for security.
For a great product, you need to say „no“ much more often than not. Do one thing and do it well. Be Redis, not JBoss.
It's funny when developers themselves think effort is so fungible. Like if you spent 1 hour on A, then you would've also made 1 hour of progress on B, C, or D, and that it would've been worthwhile. To the point of fallacy in your post.
I would think developers have the experience to realize this isn't true but I see it all the time on these forums.
FWIW no one who replied to this email thread said something even close to "no". Victor Stinner points out that startup time is something that comes up a lot and mentions some recent work in the area [1].
Python is a big ship, it may not be as nimble as a young FOSS project but it is always improving and investments in things like start up time pays dividends to a large ecosystem.
I get the impression that backwards-compatibility does weigh pretty heavily on the Python core developers these days. There are so many Python installations out there doing so much that the default answer to a change has to be "no". The fact that macOS and popular Linux distributions ship with copies of Python is great, but once something is effectively a component of operating systems, boldness is not a viable strategy. Arguably, one of the reasons why the transition to Python 3 has been so drawn out is that every time somebody installs macOS or one of many Linux distributions, a new Python 2 system is born. I've seen .NET Core developers explain that having .NET Framework shipped in Windows put them under massive constraints, and this was one of the motivations for a new runtime.
I'm not denying this phenomenon, but part of it is surely that widely used projects get more conservative because any change risks breaking something for someone somewhere. And the maintainers tend to feel a sense of responsibility to help people deal with these breakages.
I'll bring a slightly different perspective, as someone who's been using Python professionally for over a decade: there is no such thing as just saying "yes" or "no". Every "yes" to one group is at least an implicit "no" to some other group, and vice-versa.
The Python 2/3 transition is a great example of this. Python 2 continued an earlier tradition of saying "yes" to almost everything from one particular group of programmers: people working on Unix who wanted a high-level language they could use to write Unix utilities, administrative tools, daemons, etc. In doing that, Python said "no" to people in a lot of other domains.
Python 3 switched to saying "yes" to those other domains much more often. Which came with the inherent cost of saying "no" (or, more often, "not anymore") to the Unix-y crowd Python 2 had catered to. Life got harder for those programmers with Python 3. There's been work since then to mitigate some of the worst of it, but some of the changes that made Python nice to use for other domains are just always going to be messy for people doing the traditional Unix-type stuff.
Personally, I think it was the right choice, and not just because my own problem domain got some big improvements from Python 3. In order to keep growing, and really even to maintain what it already had, Python had to become more than just a language that was good for traditional Unix-y things. Not changing in that respect would have been a guaranteed dead end.
This doesn't mean it has to feel good to be someone from the traditional Unix programming domain who now feels like the language only ever says "no". But it does mean that it's worth having the perspective that this was how a lot of us felt in that golden age when you think Python said "yes" to everything, because really it was Python saying "yes" to you and "no" to me. And it's worth understanding that what feels like "no" doesn't mean the language is against you; it means the language is trying to balance the competing needs of a very large community.
"people working on Unix .... In doing that, Python said "no" to people in a lot of other domains."
Could you elaborate on this?
I thought Python was pretty good about supporting non-Unix OSes from early on. It was originally developed on SGI IRIX and MacOS. From the README for version 0.9:
> There are built-in modules that interface to the operating system and to various window systems: X11, the Mac window system (you need STDWIN for these two), and Silicon Graphics' GL library. It runs on most modern versions of UNIX, on the Mac, and I wouldn't be surprised if it ran on MS-DOS unchanged. I developed it mostly on an SGI IRIS workstation (using IRIX 3.1 and 3.2) and on the Mac, but have tested it also on SunOS (4.1) and BSD 4.3 (tahoe).
though it looks like there wasn't "painless" DOS support until 1994, with the comment "Many portability fixes should make it painless to build Python on
several new platforms, e.g. NeXT, SEQUENT, WATCOM, DOS, and Windows."
I also thought that PythonWin had very good Windows support quite early on. The 1.5a3 release notes say:
> - Mark Hammond will release Python 1.5 versions of PythonWin and his
other Windows specific code: the win32api extensions, COM/ActiveX
support, and the MFC interface.
> - As always, the Macintosh port will be done by Jack Jansen. He will
make a separate announcement for the Mac specific source code and the
binary distribution(s) when these are ready.
I think part of what explains this attitude in people is "lack of imagination". In the sense that sometimes, especially when an existing project or organization or bureaucracy has become huge and daunting, people cannot imagine excellence anymore, so they believe it to be literally impossible.
To be fair, they are frequently saying no to things other people think they should do (rather than saying no to things like contributions of startup improvements).
I think your comment is well-intentioned (I upvoted) but I respectfully disagree. I think wanting Python to be a bit faster is similar to wanting Haskell to have a little bit of mutability. Engineering with restrictions is a good thing, we can do great systems in Haskell because it's a very neat language even though it lacks mutability. We also can do great systems in Python because it's a very neat language even though it's a bit slow. Sure, you can always optimize Python's performance, that's a legitimate problem and it takes a few engineers to solve it. But it's more interesting to work around Python's slowness by engineering tricks such as better algorithms etc.
> Sure, you can always optimize Python's performance, that's a legitimate problem and it takes a few engineers to solve it. But it's more interesting to work around Python's slowness by engineering tricks such as better algorithms etc.
Surely you're not implying that improving Python's performance would preclude finding interesting algorithms, nor that this is a suitable rationale for keeping Python slow? Anyway, algos can only get you so far when they're built on slow primitives (all data scattered haphazardly across the heap, every property access is a hash table lookup, every function call is a dozen C function calls, etc).
> I think wanting Python to be a bit faster is similar to wanting Haskell to have a little bit of mutability
I'm sorry but that makes zero sense. Haskell is defined by immutability. People want to use haskell because of that characteristic. I don't want to use python because it is slow.
This is why large companies like Google often reinvents the wheel. Open source gives everyone the right to use, but not the power of control. Sure, you can fork, but then your version will diverge from the official, and the pain of maintaining compatibility may be greater than writing your own from scratch.
It's a byproduct of how many people you have to answer to. I was kind of having discussion with a coworker about an app that had a lot of features that made it seem kind of cluttered but useful. I think small projects can make bolder choices and enable more options because they have a smaller userbase that would be impacted by their changes and they want to be able to reach more people so adding a feature is generally a net benefit. But a larger project cannot risk hurting the large userbase they have already established so they have to be more cautious about the changes that they make.
I've always been disappointed at how quickly people make sweeping generalizations from a single anecdote. (I also think Python can do better here, but the generalization isn't justifiable.)
Python 3.6 is trying an enormous number of potential paths that code for imports might be found at. Why is that fixed in stone? Couldn't Python 3.(n+1) change that, if it's slow and historical, cutting out a bunch of slow system calls?
As someone who makes use of Python to deploy software, it's entirely possible that could cause me a few issues... which I'd fix quite easily. It should be totally reasonable to expect the community using the software to cope with those sorts of changes after a major release; the alternative is ossification.
Django suffered from maintaining too much compatibility, and releasing too slowly, and they fixed it. Three or four years ago everyone was talking about moving away from it; now they release often, deprecate stuff when they need to, and the project is as vibrant as it ever was. Time for cPython to learn the same lesson.
Everyone is focusing on python, but where is this "can do" spirit from mozilla? Their are languages with better startup times, bash, perl, lua, awk to name a few, and could likely do whatever the python scripts are doing.
Python3 has the exact opposite problem: Too many devs willing to say "yes" to features and a small number of devs who try to keep things fast and maintainable.
This is true but python's relative slowness (along with the GIL) is an issue that is regularly blown out of all proportion.
Part of the reason for the language's success is because it made intelligent tradeoffs that often went against the grain of the opinions of the commentariat and focused on its strengths rather than pandering to the kinds of people who write language performance comparison blog posts.
If speed were of primary importance then PyPy would be a lot more popular.
You're conflating two kinds of "performance", startup latency and steady state throughput. We're talking about the former, and you're proposing improvements for the latter. In fact, moving to pypy is exactly what you shouldn't do to improve startup.
It's surprising but frequently true that startup latency has a greater effect on the perception of performance than actual throughput. Nobody likes to type a command and then be kept waiting, even if the started program could in principle demonstrate amazing feats of computation once warmed up.
The GIL is a pretty nasty problem once you try to scale things beyond one core.
Simply try something like unpickling a 10 GB data structure while keeping your GUI in the main thread responsive. You cannot do that because the GIL locks up everything while modifying data structures. Move the data to another process instead of another thread. Great, your GUI is responsive but you can't access the data from the main thread.
You can say that such a humongous data structure is wrong or that a GUI isn't meant to be responsive or programmed in Python or that I'm holding it wrong. Probably right.
Python derives a good chunk of its speed (if not all of it) from carefully tuned libraries written in other languages (or even for other architectures in the case of many machine learning packages). As soon as you try to do a lot of heavy processing python even the compiled versions quickly bog down. IMO the best way to use python is to use it to cleverly glue together highly optimized code. That way you spend the minimum amount of effort and you get maximum performance.
I have to say that my first reaction was: "maybe you shouldn't use python for this, then". If you are using a language in a way that it gets worse in subsequent versions, that's a good sign that they're optimizing for something other than what you care about.
The programming language R does not, as I understand it, optimize for speed, because they are optimizing for ease of exploratory data analysis. R is growing quite rapidly. So is python, actually. It doesn't mean that either one is good at everything, and it's probably the case that both are growing because they don't try to be good at everything. A good toolbox is better than a multi-tool.
While the "maybe you shouldn't use Python" comment could be construed as trolling to some, there is definite truth to your line of reasoning and I agree with comment.
I absolutely love Python as a programming language for the space it is in. But as someone who needs to think long term about maintaining large projects with lifetimes measured in potentially decades, Python has a few key weaknesses that make it really difficult for me to continue justify using it for such projects. Startup time is one. The GIL is the other large one (not being able to achieve linear speedups on CPU-bound code in 2018 with Moore's Law dead is unacceptable). General performance disadvantages can be adequately addressed with PyPy, JITs, Cython, etc. Problems scaling large code bases using a dynamic language can be mitigated with typing and better tools.
Python can be very competitive against typed systems languages. But if it fails to address its shortcomings, I think more and more people will choose Rust, Go, Java, C/C++, etc for large scale, long time horizon projects. This will [further] relegate Python to be viewed as a "toy" language by more serious developers, which is obviously not good for the Python ecosystem. So I think "maybe you shouldn't use Python for this, then" is a very accurate statement/critique.
I would characterize Python's weaknesses differently.
Startup time is a problem for Python. But concurrency is much more complex than you state: threading is not the only or best concurrency model for many applications. And certainly removing the GIL will not just enable Python "to achieve linear speedups on CPU-bound code". Distributed computing is real. One of Python's problems for a long time was not the GIL, it was the sorry state of multi-process concurrency.
The speed issues that JITs solve for other languages may not be solvable in Python due to language design.
Python's slowness can help improve performance by teaching you to use techniques that end up being faster no matter the language.
Python is so slow that it forces you to be fast.
Consider data analysis: on modern machines, you're almost always better off with a columnar approach: if you have a struct foo { int a, b, c; }, you want to store int foo_a[], foo_b[], foo_c[], not struct foo data[]. It's better for the cache, better for IO, and better for SIMD.
numpy makes it much easier to use the latter than the former, whereas in C, you might be tempted with the former and not even realize how much performance you were leaving on the table. Likewise for GPU compute offloading, reliance on various tuned libraries for computationally intensive tasks, and the use of structured storage.
Sorry, I didn't mean it to be trolling, I just meant it more or less literally. If Rust (for example) gets used for things like Mercurial and Mozilla, is that bad? I'm not saying Python shouldn't care, if it could improve the startup time without sacrificing other things. But presumably the transition from py2 to py3 was not intending to make things slower, it was intending to solve other problems. There are almost always tradeoffs. Even the mercurial folks quoted in the article said that the things py3 solved were not what they needed. That's a good indicator that Python is not the right language (anymore) for what they're doing.
I am primarily a Python programmer, but if Rust, Go, etc. take over as the language of choice in certain cases, I don't think that's a bad thing. Which doesn't mean one shouldn't write an article to highlight this cost of not having short startup time, just in case this cost wasn't understood by Guido, et al. But my guess (and it's only a guess), is that it was.
> While the "maybe you shouldn't use Python" comment could be construed as trolling to some, there is definite truth to your line of reasoning and I agree with comment.
I wouldn't say I construed it as trolling. More like, "You might be right, but where does that get us?" Not trolling, but also not that constructive, because it's extremely easy to write something like "maybe you shouldn't use Python" but likely hard and time-consuming to make it so.
There are a lot of questions when considering such a move. For example:
- What's the opportunity cost of migrating $lots_of Python to Rust, or some other language?
- Is that really where you can add (or want to add) the most value?
- And what does having to do that do to your roadmap? Maybe it enables it, but surely it's also stealing time from other valuable work you could be doing?
- Longer term, are we sacrificing maintainability for performance? (In your case it sounds like the opposite?)
- How easily can we hire and onboard people using $new_tech? (Again, it sounds like you might reduce complexity.)
Basically I suppose what I'm saying is I find it a little trite when people say, "well, maybe you should do X," without having weighed the costs and benefits of doing so. And in a professional environment, if that's allowed to become a pattern of behaviour, it can contribute to the demotivation of teams. Hence, I found myself a bit irritated by the grandparent post.
If one needs Rust, C/C++ level of performance I doubt there is much Python can do and one can wonder if Python was ever the right tool for such a project.
I think your characterization of the GIL is not accurate. Show me ANY real world program that can achieve linear speedups on multicore or multi-processor systems. Humans have not sufficiently mastered multithreading to be able to make such a claim. I am not aware of any "CPU-bound" use cases that would actually use Python like this instead of, say, C or Fortran. And anyway, I submit that it would benefit (both from a design and an execution standpoint) from being multi-process (in other words, using explicitly coded communication).
Regarding the GIL I‘ve always wondered about Jython but never gotten around to trying it. What are the drawbacks of running it on a JVM to get true multithreading? Having to properly sync the threads like in other environments without global locks?
The supposed attitude of the python developers about startup time works against the popular niches Python is supposed to be such a great fit for. Little scripts, glue, short run applications.
That’s a problem if that’s an area python wants to compete in.
I might be biased because I'm from the hordes that are moving from Stata and Matlab to Python (but then there are the hordes attracted to data analysis now), but that was never really Python's strong suit, nor its target market.
I mean, I was always into little scripts, but I used Tcl and then Perl.
The linked post is about Python startup being a problem with thousands of invocations. Is Python startup really a problem for the niches you mention, or is it a problem in some extreme edge cases? I would argue this is the latter and perhaps signals that an architecture change for the build or tests would be best.
I have been using Python for small scripts for 20+ years and haven't had this issue. The JVM on the other hand was historically slow to start.
Yep. Tried to use a Raspberry Pi as my main system for a while and one of the pain points was slooooow startup of Python. As a Python fan I was embarrassed.
I don't particularly agreed about this being what "Python is supposed to be such a great fit for."
I've been to quite a few PyCons and never heard anyone espousing this view, but I'm open to the possibility that I have missed it. Can you link me to a piece of media that you think persuasively makes the case that this is what Python is supposed to be for?
Python is not optimized for small glue code at all. The fact that it is the sanest language for use in that niche speaks much more about the ecosystem than about Python.
Python seems to be mainly optimized for web servers, scientific computing and machine learning tasks. None of those care about startup time.
Python is really only the target for those because someone lied to all of the systems folk and told them that Ruby was too slow. (The previous wave of infrastructure management tools seemed to all be written in Ruby and nowadays it's Python or Go.) That and python is one of the "official" languages at Google and everyone wants to be Google, right?
Meanwhile, Ruby is making great strides in performance and even has JIT coming in 2.6.
I totally understand that milliseconds matter in the use case described in the article.
For me, personally, I use python to automate tasks - or to quickly parse through loads and loads of data. To me, startup speed is somewhat irrelevant.
I built a micro-framework that is completely unorthodox in nature, but very effective for what I needed - that being a suite of tools available from an 'internet' server, available to me (and my coworkers) over port 80 or 443.
My internet server, which runs python on the backend (and uses apache to actually serve the GET / POST) literally spits out pages in 0.012 seconds. Some of the 'tools' run processes on the system, reach out to other resources, and spit the results out in under 0.03 seconds (much of that being network / internet RTT). To me, that's good enough - adding 30 or even 300 milliseconds to any of that just wouldn't matter.
I totally get that if Python wants to be a big (read bigger?) player then startup time matters more...but for my personal use cases, I'm not concerned with the current startup time one bit.
As expected, language start up time only matters to some people. Often in my case, Python is used to build command line tools (similar to the case of Mercurial).
In such an event, the start-up time of the program might dominate the total run time of the application. And on my laptop or desktop with a fast SSD with good caching and a reasonably fast CPU... that still ends up being 'okay'.
But once I put that on an ARM chip with a mediocre hard drive - some python scripts spend so long initializing that they are practically unusable. Whereas the comparable Perl/BASH script runs almost instantaneously.
Often to make Python even practically usable for such systems I have to implement my own lazily loaded module system. Having some language which allowed me to say...
Have you tried moving import statements into the functions where they are invoked? My understanding is this is effectively the same as lazy loading the module[1].
I think you're telling us about how you're not affected by a problem that does affect other people. I feel like this doesn't add any substantial, interesting points to this discussion.
I have similar use cases. Startup time starts to matter once you either want to build test cases or put scripts in loops. If I have a script that parses one big data file, and I decide to parse 1000, it's often helpful if I can run that script a thousand times rather than refactor it to handle file lists. Or if you want to optimize some parameter.
But isn’t that the author’s point? It doesn’t seem like much time but because you’re paying it so often in so many little places it really does add up.
Sort of related story: we needed a scripting language able to run on an x86 RTOS type of architecture compiled with msvc and looked into CPython because, well, Python is after all quite a nice language. After spending a considerable amount of time to get it compiled (sorry, don't recall all the issues there, but main one was that the source code assumed msvc == windows which I know is true for 99% of cases but didn't expect a huge project like CPython to trip over) it would segfault at startup. During step-by-step debugging it was astonishing how much code got executed before even doing some actual interpreting/REPL. Now I get there might not be a way around some initialization, but still it simply looked too much to me and perhaps not overly clean either. Moreover it included a bunch of registry access (again, because it saw msvc baing used) which the RTOS didn't have in full hence the segfault. Anyway we looked further and thankfully found MicroPython which took less time to port than the time spend to get CPython even compiling. While not a complete Python implementation, it does the job fur us, and it gets away with startup/init code of just something like 100 LOC (including argument parsing etc). Yes I know it's not a fair comparision, but still, the difference is big enough to, at least for me, indicate CPython might just be doing too much at startup and/or possibly spend time on features which aren't used by many users and/or possibly drags along some old cruft. Not sure, just guessing.
Mercurial's startup time is the reason why, for fish, I've implemented code to figure out if something might be a hg repo myself.
Just calling `hg root` takes 200ms with hot cache. The equivalent code in fish-script takes about 3. Which enables us to turn on hg integration in the prompt by default.
The equivalent `git rev-parse` call takes about 8ms.
But 8ms is still too slow for me. :) I implemented the Git recognition code myself in my own prompt using the minimal amount of FS operations [1], and it renders in 5 ms from start to finish, including a "git:branch-name/47d72fe825" display.
One of the reasons git-rev-parse takes slightly longer than your implementation is that you just unconditionally truncate the SHA-1 to 10 bytes. E.g. run this on linux.git:
You'll get 4 SHA-1s that are ambiguous at 10 characters, this problem will get a lot worse on bigger repositories.
Which is not to say that there isn't a lot of room for improvement. The scope creep of initialization time is one of the things that tends to get worse over time without being noticed, but Git unlike (apparently) Python makes huge use of re-invoking itself as part of its own test suite (tens of thousands of times), so it's naturally kept in check somewhat.
If you have this use-case I'd encourage you to start a thread on the Git mailing list about it.
I put similar code in Emacs's vc-hg to get revision information straight form Mercurial's on-disk data structures instead of firing up an hg subprocess.
You mean actually reading dirstate[0] or just the branch/bookmark files?
We also do the latter, but dirstate format isn't easily readable just with shell builtins (lots of fixed-length fields with NUL-byte padding, also we don't even have a `stat` builtin and the external program isn't a thing on macOS AFAIK), so we still fire up `hg status` for that - but only after we decide that there is a hg repo.
Somewhat tangentially, I noticed that fish performs quit badly in remote-mounted (sshfs) directories that are git repositories. I wonder if it would be possible to detect a remote mounted filesystem and turn off/tone down some of the round-trip heavy operations?
I've gone through your problem myself countless times, and concluded that hitting ctrl+c to interrupt the status line every time it tries to render the current repository state is not very productive.
My git status line uses timelimit (https://devel.ringlet.net/sysutils/timelimit/) to automatically stop if any of the git status parts (dirty/staged/new files) take > 0.1 seconds to finish:
Ironically, xonsh itself suffers from a long startup time due to it's use of python. This is my primary (negative) experience with the issue in the linked article, and the reason why I stopped using xonsh.
This is truly a problem. Even more so if you host your application on a network directory. Loading all the small files takes ages. I really wish there would be a good way to compile the whole application with all the modules into one package once you're ready to release. I really wish the creators of Python would have given such use-cases more consideration.
Edit: I'm aware that there are solutions that put everything a program touches into a kind of executable archive. A single file several hundred Megabytes in size. I've tested it. It doesn't really pre-compile the modules. The startup time was exactly the same.
First time I hear about this, and I've looked for alternatives to cxfreeze and its cousins in the past.
Any time I see something like this, I feel like I'm hearing about some homeopathic cancer cure. If Nuitka actually does what it says it does, it's solving a big recurrent problem for the Python community, so why is nobody talking about it?
I know this isn’t everyone’s favorite but Cython has a way to convert your python code
Into an executable with Python embedded and I bekievr it also
Packs your imports
Cython is a complicated beast but I feel like it just needs a more friendly wrapper for this to be more widespread.
I worked on one Python application that had a startup time problem because it was on a network filesystem with slow metadata/stat times. It took several seconds to start Python.
We were able to solve most of the problem by zipping up the Python standard library and the our application.
That is, if you look at sys.path you'll see something like:
If you zip up the python3.6 directory into python36.zip then it will use that zip file as the source of the standard library, and use the zip directory structure instead of a bunch of stat calls to find the data.
This should also include getting access to the pre-compiled byte code.
You can also have Python byte-compile all of the .py files in a directory as part of your build/zip process.
Also note calls to imp.load_source need to change to imp.load_compiled, and any .py files references directly in code need to be changed to .pyc (this is with 2.7, not sure about 3.x)
I think design choices made in Python simply don't allow for comprehensive ahead of time compilation. For what it's worth, they have recently landed snapshots in Dart that do what you want:
1. Don't do that. Either write the driving app in Python or write the subprocesses in an ahead-of-time compiled language. Python's a great language but it's not the right tool for everything.
2. Be parsimonious with the modules you import. During development, measure the performance after adding new imports. E.g., one graph libraries I tried had all its many graph algorithm implementations separated into modules and it loaded every single one of them even if all you wanted to do was to create a data structure and do some simple operations on it. We just wrote our own minimal class.
> Don't do that. Either write the driving app in Python
Even if you write the driver in Python, you don't necessarily want to call the program you're testing in the same process. You might want independent launches of a command-line tool, so that you test the same behavior people get when they run the tool. Otherwise, your test suite might trip over some internal state that gets preserved from run to run in ways that command-line invocation wouldn't.
Good point, but I didn't mean to sound specific to testing apps. I just meant, in general, write big apps using Python top-down and something precompiled if you must spawn lots of external processes.
I've definitely seen significant improvements with #2. Unfortunately, it's not very Pythonic to tuck your imports into functions (or under conditionals). It would be nice if imports were more lazily evaluated.
The slow startup combined with the general lack of interest of the Python ecosystem to try to find a solution for distributing self contained applications was the biggest reason we ended up writing out CLI tool in something else even though we are a Python shop.
I'm really curious why there hasn't been much of a desire to change this and it even got worse as time progressed which is odd.
Lots of libraries are terribly slow to import (e.g. requests), but right now there is little incentive to fix it as there is no toolchain to build good, self-contained python CLI apps.
I've written a whole bunch of CLI tools over the years and maintained some I didn't author originally, I always found it annoying how slow these are. A CLI tool for some larger project can easily take a second just to display --help, or a command line parse error. Tests running against it can be made to progress faster (no forking, no separate interpreter, in-process capture etc. which brings a lot of complexity and subtle differences and error persistence/"tainting" of the execution environment), but still you might only get a few CLI invocations per second per core.
Theses experience are a major turn-down from Python for me.
It’s only a hard problem if there is no desire. The slowdowns for the interpreter startup did not happen because they are necessary but because site.py and friends just do more stuff now and a lot of important internal tooling became unnecessarily complex.
This is disappointing to me too, but I think there are some problems baked in to the language that make it hard.
- Imports can't be parsed statically.
- Startup time has two major components: crawling the file system for imports, and running all the init() functions of every module, which happens before you get to main(). The first is only fixable through breaking changes, and the second is hard to fix without drastically changing the language.
The import code in CPython was a mess, which was apparently cleaned up by importlib in Python 3, through tremendous effort. But unfortunately I think importlib made things slower?
I recall a PyCon talk where as of 3.6, essentially everything about Python 3 is now faster than Python 2, EXCEPT startup time!
This is a shame, because I would have switched to Python 3 for startup time ALONE. (As of now, most of my code and that of my former employer is Python 2.) That would have been the perfect time to address startup time, because getting a 2x-10x improvement (which is what's needed) requires breaking changes.
I don't think there's a lack of interest in the broader Python community, but there might be a lack of interest/manpower in the core team, which leads to the situation wonderfully summarized in the recent xkcd:
FWIW I was the one who sent a patch to let Python run a .zip file back in 2007 or so, for Python 2.6 I think. This was roughly based on what we did at Google for self-contained applications. A core team member did a cleaner version of my patch, although this meant it was undocumented until Python 3.5 or so:
The .zip support at runtime was a start, but it's really the tooling that's a problem. And it's really the language that inhibits tooling.
Also, even if you distributed self-contained applications, the startup time is not great. It's improved a bit because you're "statting" a zip file rather than making syscalls, but it's still not great.
In other words, I have wondered about this "failure" for over a decade myself, and even tried to do something about it. I think the problem is that there are multiple parts to the solution, the responsibility for these parts is distributed. I hate to throw everything on the core team, but module systems and packaging are definitely a case where "distributed innovation" doesn't work. There has to be a central team setting standards that everyone else follows.
Also, it's not a trivial problem. Go is a static language and is doing better in this regard, but still people complain about packaging. (vgo is coming out after nearly a decade, etc.)
I should also add that while I think Python packaging is in the category of "barely works", I would say the same is true of Debian. And Debian is arguably the most popular Linux package manager. They're cases of "failure by success".
> The import code in CPython was a mess, which was apparently cleaned up by importlib in Python 3, through tremendous effort. But unfortunately I think importlib made things slower?
AFAIK importlib is entirely written in Python and kinda portable across Python implementations, while previously most was C code. It's not surprising something gets slower when written in Python.
> Also, even if you distributed self-contained applications, the startup time is not great. It's improved a bit because you're "statting" a zip file rather than making syscalls, but it's still not great.
PyQt applications on Windows typically take two or more seconds before they can do anything, including Enterprise's favourite start-up pastime, splashscreens. Except maybe if you rolled your own .exe wrapper that displayed the splash before invoking any of the Python loading.
That's really, really poor in the age of 4 GHz CPUs from the factory, RAM big enough to fit multiple copies of all binaries on a PC and SSDs with at the very least tens of thousands of IOPS.
EDIT: I should also add that the length of PYTHONPATH as constructed by many package managers is a huge problem. You're doing O(m*n) stat()s -- random disk access -- which is the slowest thing your computer can do.
m is the number of libraries you're importing, and n is the length of the PYTHONPATH.
So it gets really bad, and it's not just one person's "fault". It's a collusion between the Python interpreter's import logic and how package managers use it.
You've characterized the problems well. And yes this is a core problem for python - startup time and import processing is limiting in a lot more cases than just CLI tools. And yes the design of the language makes it hard or possibly impossible to solve.
Also, even if you distributed self-contained applications, the startup time is not great. It's improved a bit because you're "statting" a zip file rather than making syscalls, but it's still not great.
Exactly. There is no silver bullet. The problem is how much code gets run on startup, and how Python's dynamic nature makes traditional startup speedup strategies impossible. Is this even fixable?
35ms for Python is ok. What we see in reality is that the imports that a real application will use, adds a whole lot more time.
For example, if you want a snappy command line response for a Gtk-using Python program, you probably want to handle command line arguments before even importing Gtk. Maybe it is --help or an argument that you pass on to another running instance, and you want it to be absolutely snappy and fast.
> C (GCC 7.3): 2ms (int main(void) { puts("hi"); })
Not really a fair comparison given the other 3/4 have to do all their parsing and compiling. Unless in those 2ms you include compilation time. Or use tcc -run.
Given it is known how slow Python at starting up, I am puzzled why Mozilla continue to use it in build scripts. Perl is just as portable but starts up like 10 times faster.
I wrote the linked post and maintain the Firefox build system. The reason is that in 2018 (and for the past 10 years honestly) and it is far easier to find people who know Python than Perl. Python is essentially the lingua franca in Firefox land for systems-level tasks that don't warrant a compiled language. As I said in the post, Rust will likely infringe on Python over time due to performance, lower defect rate, long-term maintenance advantages, etc.
Can't wait for a static, dependency-free hg, but sadly they aren't going to rewrite it entirely in Rust, just the speed sensitive parts. I don't care if it's written in Rust, D, Go or even C. When we picked fossil over hg and git, the Python dependency made us shy away from hg, although the CLI and the overall experience was better than git. Now we also fossil as a deployment tool; it can be a statically compiled drop anywhere binary.
I imagine there are two aspects to this, they probably started in python and have a lot of it already, and it's probably easier to gen new folks involved which I think is one of their goals.
Compile time is absolutely dominated by c++ (and now rust) compilation and linking. I doubt build system language choice will ever bubble up to relevance, so why optimize for it?
Perl is “just as portable” in the same way that a motorcycle can just as easily drive under a steamroller... it’s not gonna be pretty and there’s no easy way out if you do it.
I write Perl scripts for Windows and Linux, and I don't find portability to be especially onerous. Of course there are platform differences to keep in mind, but is that any different from any other cross-platform scripting language?
Did you read the link we're discussing? It's in large part about why the slow startup times for Firefox's build scripts are a problem.
> Changing gears, my day job is maintaining Firefox's build system. We use
Python heavily in the build system. And again, Python startup overhead
is problematic. I don't have numbers offhand, but we invoke likely a few
hundred Python processes as part of building Firefox. It should be
several thousand. But, we've had to "hack" parts of the build system to
"batch" certain build actions in single process invocations in order to
avoid Python startup overhead. This undermines the ability of some build
tools to formulate a reasonable understanding of the DAG and it causes a
bit of pain for build system developers and makes it difficult to
achieve "no-op" and fast incremental builds because we're always
invoking certain Python processes because we've had to move DAG
awareness out of the build backend and into Python. At some point, we'll
likely replace Python code with Rust so the build system is more "pure"
and easier to maintain and reason about.
Naive question: If the startup time matters because you're imposing that startup time hundreds or thousands of times - why not remove the startup time?
I'm saying, use the emacs model. Start hg with a flag so it simply keeps running in the background while listening on a port. Run a bare-bones nc script to pipe commands to hg over a port and have it execute your commands.
This isn't a new problem, nor is it even a new solution. No complete re-write of the interpreter or the tool required.
There's a paragraph in the OP about how they've actually done this:
> Mercurial provides a `chg` program that essentially spins up a daemon `hg` process running a "command server" so the `chg` program [written in C - no startup overhead] can dispatch commands to an already-running Python/`hg` process and avoid paying the startup overhead cost. When you run Mercurial's test suite using `chg`, it completes minutes faster. `chg` exists mainly as a workaround for slow startup overhead.
Just like this isn't what the usual `emacs` command does (it's `emacsclient`), it isn't what the usual `hg` command does either. There are some disadvantages to this solution and some assumptions it makes, which have apparently led the Mercurial maintainers to conclude, like the Emacs maintainers, that it won't work as the default. Hence the desire for solutions that will.
I feel the same way about Clojure. For a LISP, where interactive development via the REPL is supposed to be one of the value-add of the language, it falls completely short in that aspect. They even have entire libraries and design patterns (Component, etc.) to work around the issue, but I find it ridiculous that your entire program structure is dictated by the fact that the REPL boot up time is too damn slow.
It's the main reason I don't use Clojure. I was so excited to learn a modern Lisp. Got an my tools working and wrote my first cli app. Horrendous load time. I realised it's really only suitable for long running processes and I never do that sort of thing so can't use it.
Python is great for prototyping or even real apps if performance isn't so critical. However, more than once I've found myself in the situation where I wrote a bunch of Python code and then end up starting that code up from another app, just like the thread discusses and I immediately feel like this is an anti-pattern.
What's even more annoying is that my Python code usually calls a whole lot of C libraries (OpenCV, numpy, etc.) So it's like this: app->OS process->python interpreter->my python code->C libraries. That just really feels wrong so I'd like two things:
1) better/easier path to embed python scripts into my app e.g. resident interpreter
2) some way of passing scripts to python without restarting a new process, this may exist and I'm unaware
Startup time has also been the biggest gripe I have with Julia so far. Otherwise it's a truly fantastic language to work in. I wasn't able to put the `__precompile__()` function to good use it seems - the time it takes to execute my program didn't change at all for some reason. Or maybe it's not actually the startup time that caused the problem, but the time it took to perform file IO. Anyways my program now takes even much longer time to startup than the Python equivalent (though it runs much faster once started), which is a real disappointment.
precompile doesn't store native compiled code. Though I know from talking to the compiler developers that this is high on the 1.x list. It's an annoyance but at least it has a clear solution in sight.
Truly solving this problem is difficult, but you can hack around it with a zygote process to remove a substantial amount of overhead, in exchange for RAM. While this is generally more of win for server processes, you can see it applied to a CLI proof of concept:
I agree Python's startup time is too slow. But one trick you can use to improve it some is the "-S" flag, which skips site-specific customizations. On my Ubuntu system it brings Python 3.6 startup time down from 36ms to 18ms for me; still not great, but it helps.
The drawback is this may screw up your Python environment, not sure how easy it is to work around it if it does.
Perhaps it would be possible to read in the source files, compile them, and preserve an image of the state immediately before reading input or command line.
I was kind of amazed how penalized a script could be by collecting all its “import” statements at the top. Once somebody’s command couldn’t even print “--help” output in under 2 seconds, and after measuring the script I told them to move all their imports later and the docs appeared instantly.
I'm a long time python user, but never really peeked under the hood. However, I have a few ideas.
Optimized modules loading: maybe loading a larger 'super' module would be faster than several smaller ones? For example a python program could be analyzed to find it's dependent modules, and then pack all these into a 'super' module.
Once the python program executes, it would load the single 'super' module and hopefully bypass all the dynamic code which each module runs when imported to load up.
As mentioned previously, this is just off the top of my head and would certainly warrant more investigation/profiling to confirm my hypothesis.
I'm pretty sure it's too late by now for Python, but I've had some success with compiling C-based interpreters [0] to C; that is, generating the actual C code that the interpreter would execute to run the program. That way you can reuse much of the interpreter, keep the dynamic behavior and still get nimble native executables.
Should be able to hot boot the VM with the right tooling. You can reuse HPC "checkpoint" code from supercomputing environments as a generic hammer for Python/Ruby/JVM. Some Russians figured out how to do it in userspace without a kernel mod: https://criu.org/Main_Page
People here comment about how python is slow, but even fast/slow is I'll defined in my opinion. You don't see people hacking tensor flow (generally) in native languages to speed it up, they just enable CUDA. I'm imagining fast definition is limited to massively parallel server workloads with io.
Reminds me of buildout. It's awful piece of software. We used in previous Flask project, and a simple flask shell takes 3 minutes to start. If you type `import` in CPython shell it will literally freeze for a few seconds. Because it injects one sys.path for each packages specified!!!
Indeed, it seems like a perfectly good solution to me. I guess it's something about purity and not being the perfect solution. Wouldn't it be great if python was as fast as a C program that took many times longer to write? Yes, but that would probably be magic.
At a guess: They didn't hear about it (keeping your ears open is a cost not everyone wants to pay). They don't want to bother with setting it up. They don't want to bother with maintaining it (even if it's as simple as reinstall every time you get a new computer).
Would it be feasible to keep a set of Python interpreters around at all times and use a round robin approach to feed each already-on interpreter commands then perform an interpreter environment cleanup out-of-band after a task is complete?
The Java ecosystem had this with Drip and I think it turned out to not be a great idea in practice - the magazine of VMs get exhausted when you don't want it to, they get into odd states and other things I think, can't quite remember.
I guess a fork()'ed process triggers copy-on-write behavior in the kernel once the process starts running. So that's latency (the copying) you could still optimize away.
What would be really nice is checkpoint and restart (i.e., unexec), but it turns out that it's extremely hard to implement and get right in a non-managed environment.
Slowness is the elephant in the room in Python land. It's like everybody has decided to cover their eyes in front of this massive pachyderm. A massive delusion
Delusion? I don't think many cover their eyes. More likely they've come to accept that for their use cases the performance is good enough and the convenience gain well worth it.
It's weird to see someone make this pitch when C systems software development regularly requires us to try and shave off microseconds. Millisecond delays mean you've already fucked up.
Performance and startup performance are really seperate things.
For instance, for many many CLI tasks a python script will be many times faster than a Java tool, just due to the JVM startup. It doesn't really matter if the Java would even run INSTANTLY... the JVM startup time just kills speed for small CLI invocations.
Shell is an interpreted language and its startup time is quite fast (5-7 ms on my machine, which is not a particularly fast machine).
In fact, large parts of git were written in shell until they realized that shell is only fast on UNIX because of co-evolution (you can fork without exec, and fork is quite fast), and on other platforms like Windows, existing shell implementations are much slower and there isn't a well-tuned production-ready shell that does things completely differently. Then they started rewriting everything in C.
performance != startup time for lots of applications.
I use a C curses application (dokia) to store some oft-used commands, but anything that won't be run 500x/day or runs longer than ~1/10th of a second I'll write in python for easier/more powerful development
Most of this email thread is comparing Mercurial (Python) to Git. I'm not familiar enough with Git's internals to know why and where the languages are split, but it uses a significant amount of Shell scripting and Perl it its code base. You can put 'git-foo' anywhere on PATH and it'll get picked up. So in the comparison they're making startup time doesn't seem to be an issue for a combination of those languages, but it is for Python. It doesn't sound like their problem is that Python is an interpreted language.
Git is actively rewriting many of their shell and Perl code in C. Performance and portability are given as reasons (having shell and Perl as a dependency on Windows is a bit of a kludge). And shell scripts are much slower on Windows because new process overhead on Windows is ~10x what it is on POSIX platforms. (New threads, however, are faster to create on Windows than on POSIX.)
Why shouldn't the interpreted program start faster? Bytecode is usually at least 2x denser than machine code, so all things being equal, when starting an interpreted program, you should be doing less IO, take fewer page faults, and so run faster, at least if you defer computationally-intensive work to specialized AOT-compiled helpers.
That interpreted programs frequently start slower than their compiled equivalents reflects badly on interpreter implementations, not the concept of interpretation itself.
"There are many great reasons to use Python, but execution speed is not one of them."
Um, I think there are lots of examples where using pythons internal data structures as they were designed results in code that is fast enough.
Even though the language implementation is interpreted, lots of common things can be optimized under the hood using data structure and data type specific execution paths and so on.
Maybe you didn't read the post, but in the use cases specified, performance is not important--the author is mostly speaking about deploy and test scripts. Whether these take 1 minute or 10 is not particularly interesting, but you would of course prefer faster if possible. That's the point here--of course a faster Python interpreter is better, and the Python maintainers should place a higher priority on it than the very low priority that they currently do.
Imagine you import tons of modules which often are only available in Python. This gets you going really quickly with your project and it runs very smoothly. Transferring this to C++ would probably take so long you won't even finish to find out before you run out of funding.
I have hopes that Rust or some descendant of Rust will get us there in maybe 10 years but in the meantime it would be better to get Python up to speed as good as possible.
I could push fine to BitBucket if I used Python 2 version locally, and same for my VPS if I used the Python 2 version both locally and remotely.
But as soon as I touched the Python 3 version of Mercurial the pull/push problems began. I don't recall the exact error and maybe it's fixed now (this happened like 6 months ago), but I don't think I'll give it another try for some time.
My work is considering switching to Git mostly because we think adopting Bitbucket will force us to. Is that not true? I'd love some reasons to stay with hg...
I've always been disappointed by how large software projects, both FOSS and commercial, lose their "can do" spirit with age. Long-time contributors become very quick with a "no". They dismiss longstanding problems as illegitimate use cases and reject patches with vague and impervious arguments about "maintainability" or "complexity". Maybe in some specific cases these concerns might be justified, but when everything garners this reaction, the overall effect is that progress stalls, crystallized at the moment the last bit of technical boldness flowed away.
You can see this attitude of "no" on this very HN thread. Read the comments! Instead of talking about ways we can make Python startup faster, we're seeing arguments that Python shouldn't be fast, we shouldn't try to make it faster, and that programs (and, by implication, programmers) who want Python startup to be fast are somehow illegitimate. It's a dismal perspective. We should be exercising our creativity as a way to solve problems, not finding creative ways to convince ourselves to accept mediocrity.
This isn't an attitude of "no" - it's an attitude of "yes" to other things. The arguments are that making Python startup fast makes other things worse, and we care about those other things.
Here are some other things we can say "yes" to:
- Rewrite as much of Mercurial in Rust as possible, which will provide performance improvements well beyond what Python can possibly offer. https://www.mercurial-scm.org/wiki/OxidationPlan
- Spend resources on developing PyPy, which (being a JIT) has relatively slow startup but much faster performance in general, for people who want fast performance.
- Write compilers from well-typed Python to native code.
- Keep CPython easy to hack on, so that more people with a "can do" spirit can successfully contribute to CPython instead of it being a mess of special cases in Guido's head.
Will you join me in saying "yes" to these things and not convincing ourselves to accept mediocrity?
I have to note that none of the projects you suggested, all of which are good and useful, will do anything to address cpython startup latency problem under discussion. Why shouldn't cypthon be better?
There's also no reason to believe that startup improvements would make the interpreter incomprehensible; the unstated assumption that improvements in this area must hurt hackability is interesting. IME, optimizations frequently boost both simplicity and performance, usually by unifying disparate code paths and making logic orthogonal.
11 replies →
I'm so longing for a Python(like) compiler.
MicroPython put together a Python in 250kb. Why the hell can't we make an LLVM frontend for Python that can use type hints for optimization? Sure, you lose some dynamic features as you optimize for speed, but that's the dream. Quickly write a prototype, not caring about types, optimize later with adding types and removing dynamicism.
I'm currently learning Racket and LLVM and I have about 70 more years to live. I'm gonna try make Python fast on slow weekends 'til I die.
13 replies →
> PyPy, which (being a JIT) has relatively slow startup
13 replies →
None of these address, for instance, the issue raised about the firefox build invoking python many times. This seems both an accepted use case of cpython and an area where traditionally cpython has a huge edge on the JVM and PyPy. If scripts are not a priority, what is the expected use case of cpython?
I would like to note, the cpython ties to the PyObject C abi seem to stymie rather than encourage “hacking”. Cpython seems to have traditionally valued stability over all else.... see the issues pypy has had chasing compatibility with c and retaining speed.
So: normally i’m with you and a language should lean into its strengths, but i’ve always listed startup time as a primary strength of python!
2 replies →
This hits the nail on the head.
Optimization is very, very rarely completely „free“ - and usually a concious trade of some property for another trait that‘s deemed more important in a specific case.
Simplicity for performance. Code size for compilation speed. Startup time for architectural complexity. UX for security.
For a great product, you need to say „no“ much more often than not. Do one thing and do it well. Be Redis, not JBoss.
I love how this article gets down to the essence of it: https://blog.intercom.com/product-strategy-means-saying-no/
1 reply →
> Rewrite as much of Mercurial in Rust as possible, which will provide performance improvements well beyond what Python can possibly offer. https://www.mercurial-scm.org/wiki/OxidationPlan
I read that article and I'm still wondering: why Rust?
20 replies →
I agree with you. As the limitation of developing resources, say "no" is difficult but important.
I am slightly afraid to ask, but what is a "well typed python"?
1 reply →
correct. Every time you say yes, your saying no to something else. Its important to realize what your saying no to, before you say yes.
It's funny when developers themselves think effort is so fungible. Like if you spent 1 hour on A, then you would've also made 1 hour of progress on B, C, or D, and that it would've been worthwhile. To the point of fallacy in your post.
I would think developers have the experience to realize this isn't true but I see it all the time on these forums.
1 reply →
> This isn't an attitude of "no" - it's an attitude of "yes" to other things.
You are literally bringing an attitude of "no" to the question of whether you are being an attitude of "no" to the discussion....
1 reply →
FWIW no one who replied to this email thread said something even close to "no". Victor Stinner points out that startup time is something that comes up a lot and mentions some recent work in the area [1].
Python is a big ship, it may not be as nimble as a young FOSS project but it is always improving and investments in things like start up time pays dividends to a large ecosystem.
[1] https://mail.python.org/pipermail/python-dev/2018-May/153300...
I get the impression that backwards-compatibility does weigh pretty heavily on the Python core developers these days. There are so many Python installations out there doing so much that the default answer to a change has to be "no". The fact that macOS and popular Linux distributions ship with copies of Python is great, but once something is effectively a component of operating systems, boldness is not a viable strategy. Arguably, one of the reasons why the transition to Python 3 has been so drawn out is that every time somebody installs macOS or one of many Linux distributions, a new Python 2 system is born. I've seen .NET Core developers explain that having .NET Framework shipped in Windows put them under massive constraints, and this was one of the motivations for a new runtime.
I'm not denying this phenomenon, but part of it is surely that widely used projects get more conservative because any change risks breaking something for someone somewhere. And the maintainers tend to feel a sense of responsibility to help people deal with these breakages.
I'll bring a slightly different perspective, as someone who's been using Python professionally for over a decade: there is no such thing as just saying "yes" or "no". Every "yes" to one group is at least an implicit "no" to some other group, and vice-versa.
The Python 2/3 transition is a great example of this. Python 2 continued an earlier tradition of saying "yes" to almost everything from one particular group of programmers: people working on Unix who wanted a high-level language they could use to write Unix utilities, administrative tools, daemons, etc. In doing that, Python said "no" to people in a lot of other domains.
Python 3 switched to saying "yes" to those other domains much more often. Which came with the inherent cost of saying "no" (or, more often, "not anymore") to the Unix-y crowd Python 2 had catered to. Life got harder for those programmers with Python 3. There's been work since then to mitigate some of the worst of it, but some of the changes that made Python nice to use for other domains are just always going to be messy for people doing the traditional Unix-type stuff.
Personally, I think it was the right choice, and not just because my own problem domain got some big improvements from Python 3. In order to keep growing, and really even to maintain what it already had, Python had to become more than just a language that was good for traditional Unix-y things. Not changing in that respect would have been a guaranteed dead end.
This doesn't mean it has to feel good to be someone from the traditional Unix programming domain who now feels like the language only ever says "no". But it does mean that it's worth having the perspective that this was how a lot of us felt in that golden age when you think Python said "yes" to everything, because really it was Python saying "yes" to you and "no" to me. And it's worth understanding that what feels like "no" doesn't mean the language is against you; it means the language is trying to balance the competing needs of a very large community.
"people working on Unix .... In doing that, Python said "no" to people in a lot of other domains."
Could you elaborate on this?
I thought Python was pretty good about supporting non-Unix OSes from early on. It was originally developed on SGI IRIX and MacOS. From the README for version 0.9:
> There are built-in modules that interface to the operating system and to various window systems: X11, the Mac window system (you need STDWIN for these two), and Silicon Graphics' GL library. It runs on most modern versions of UNIX, on the Mac, and I wouldn't be surprised if it ran on MS-DOS unchanged. I developed it mostly on an SGI IRIS workstation (using IRIX 3.1 and 3.2) and on the Mac, but have tested it also on SunOS (4.1) and BSD 4.3 (tahoe).
though it looks like there wasn't "painless" DOS support until 1994, with the comment "Many portability fixes should make it painless to build Python on several new platforms, e.g. NeXT, SEQUENT, WATCOM, DOS, and Windows."
I also thought that PythonWin had very good Windows support quite early on. The 1.5a3 release notes say:
> - Mark Hammond will release Python 1.5 versions of PythonWin and his other Windows specific code: the win32api extensions, COM/ActiveX support, and the MFC interface.
> - As always, the Macintosh port will be done by Jack Jansen. He will make a separate announcement for the Mac specific source code and the binary distribution(s) when these are ready.
11 replies →
That's a nice sounding comment, but... could you be a little more specific about what particular "traditional Unix-y things" did Python 3 say "no" to?
...I can't really think of many, if any at all. Sometimes you just say "no" to "inertia".
I think it's because they've seen exactly where saying "yes" leads them and they don't like that place.
They hate fast code?
3 replies →
I think part of what explains this attitude in people is "lack of imagination". In the sense that sometimes, especially when an existing project or organization or bureaucracy has become huge and daunting, people cannot imagine excellence anymore, so they believe it to be literally impossible.
To be fair, they are frequently saying no to things other people think they should do (rather than saying no to things like contributions of startup improvements).
We is very abstract term I am sure if you proposed a patch that addressed the issue without adverse side-effects it would get accepted.
I think your comment is well-intentioned (I upvoted) but I respectfully disagree. I think wanting Python to be a bit faster is similar to wanting Haskell to have a little bit of mutability. Engineering with restrictions is a good thing, we can do great systems in Haskell because it's a very neat language even though it lacks mutability. We also can do great systems in Python because it's a very neat language even though it's a bit slow. Sure, you can always optimize Python's performance, that's a legitimate problem and it takes a few engineers to solve it. But it's more interesting to work around Python's slowness by engineering tricks such as better algorithms etc.
That's not a great analogy. Haskell is a neat language in part because it doesn't have mutability. Python is a neat language despite being slow.
I can't imagine anyone would object if Python could magically be 10x faster. I can't say the same thing for the Haskell thing.
5 replies →
> Sure, you can always optimize Python's performance, that's a legitimate problem and it takes a few engineers to solve it. But it's more interesting to work around Python's slowness by engineering tricks such as better algorithms etc.
Surely you're not implying that improving Python's performance would preclude finding interesting algorithms, nor that this is a suitable rationale for keeping Python slow? Anyway, algos can only get you so far when they're built on slow primitives (all data scattered haphazardly across the heap, every property access is a hash table lookup, every function call is a dozen C function calls, etc).
> I think wanting Python to be a bit faster is similar to wanting Haskell to have a little bit of mutability
I'm sorry but that makes zero sense. Haskell is defined by immutability. People want to use haskell because of that characteristic. I don't want to use python because it is slow.
2 replies →
This is why large companies like Google often reinvents the wheel. Open source gives everyone the right to use, but not the power of control. Sure, you can fork, but then your version will diverge from the official, and the pain of maintaining compatibility may be greater than writing your own from scratch.
It's a byproduct of how many people you have to answer to. I was kind of having discussion with a coworker about an app that had a lot of features that made it seem kind of cluttered but useful. I think small projects can make bolder choices and enable more options because they have a smaller userbase that would be impacted by their changes and they want to be able to reach more people so adding a feature is generally a net benefit. But a larger project cannot risk hurting the large userbase they have already established so they have to be more cautious about the changes that they make.
I've always been disappointed at how quickly people make sweeping generalizations from a single anecdote. (I also think Python can do better here, but the generalization isn't justifiable.)
With major infrastructure like Python there's a tendency to over-emphasise compatibility between releases.
Look at this post in the same list thread: https://mail.python.org/pipermail/python-dev/2018-May/153300...
Python 3.6 is trying an enormous number of potential paths that code for imports might be found at. Why is that fixed in stone? Couldn't Python 3.(n+1) change that, if it's slow and historical, cutting out a bunch of slow system calls?
As someone who makes use of Python to deploy software, it's entirely possible that could cause me a few issues... which I'd fix quite easily. It should be totally reasonable to expect the community using the software to cope with those sorts of changes after a major release; the alternative is ossification.
Django suffered from maintaining too much compatibility, and releasing too slowly, and they fixed it. Three or four years ago everyone was talking about moving away from it; now they release often, deprecate stuff when they need to, and the project is as vibrant as it ever was. Time for cPython to learn the same lesson.
It may also be that they simply don't have an attack on the startup problem.
Competition. Hiphop VM lit a fire under the PHP team.
Everyone is focusing on python, but where is this "can do" spirit from mozilla? Their are languages with better startup times, bash, perl, lua, awk to name a few, and could likely do whatever the python scripts are doing.
> but where is this "can do" spirit from mozilla?
The mail included both
> At some point, we'll likely replace Python code with Rust so the build system is more "pure" and easier to maintain and reason about.
and
> Since I am disproportionately impacted by this issue, if there's anything I can do to help, let me know.
For a large enough project or area of concern, saying "no" to many things is essential to saying "yes" to anything.
Python3 has the exact opposite problem: Too many devs willing to say "yes" to features and a small number of devs who try to keep things fast and maintainable.
Remember that Python2 was faster.
That changed with the dict improvements in 3.6.
This is true but python's relative slowness (along with the GIL) is an issue that is regularly blown out of all proportion.
Part of the reason for the language's success is because it made intelligent tradeoffs that often went against the grain of the opinions of the commentariat and focused on its strengths rather than pandering to the kinds of people who write language performance comparison blog posts.
If speed were of primary importance then PyPy would be a lot more popular.
You're conflating two kinds of "performance", startup latency and steady state throughput. We're talking about the former, and you're proposing improvements for the latter. In fact, moving to pypy is exactly what you shouldn't do to improve startup.
It's surprising but frequently true that startup latency has a greater effect on the perception of performance than actual throughput. Nobody likes to type a command and then be kept waiting, even if the started program could in principle demonstrate amazing feats of computation once warmed up.
The GIL is a pretty nasty problem once you try to scale things beyond one core.
Simply try something like unpickling a 10 GB data structure while keeping your GUI in the main thread responsive. You cannot do that because the GIL locks up everything while modifying data structures. Move the data to another process instead of another thread. Great, your GUI is responsive but you can't access the data from the main thread.
You can say that such a humongous data structure is wrong or that a GUI isn't meant to be responsive or programmed in Python or that I'm holding it wrong. Probably right.
11 replies →
Python derives a good chunk of its speed (if not all of it) from carefully tuned libraries written in other languages (or even for other architectures in the case of many machine learning packages). As soon as you try to do a lot of heavy processing python even the compiled versions quickly bog down. IMO the best way to use python is to use it to cleverly glue together highly optimized code. That way you spend the minimum amount of effort and you get maximum performance.
2 replies →
I have to say that my first reaction was: "maybe you shouldn't use python for this, then". If you are using a language in a way that it gets worse in subsequent versions, that's a good sign that they're optimizing for something other than what you care about.
The programming language R does not, as I understand it, optimize for speed, because they are optimizing for ease of exploratory data analysis. R is growing quite rapidly. So is python, actually. It doesn't mean that either one is good at everything, and it's probably the case that both are growing because they don't try to be good at everything. A good toolbox is better than a multi-tool.
(I authored the linked post)
While the "maybe you shouldn't use Python" comment could be construed as trolling to some, there is definite truth to your line of reasoning and I agree with comment.
I absolutely love Python as a programming language for the space it is in. But as someone who needs to think long term about maintaining large projects with lifetimes measured in potentially decades, Python has a few key weaknesses that make it really difficult for me to continue justify using it for such projects. Startup time is one. The GIL is the other large one (not being able to achieve linear speedups on CPU-bound code in 2018 with Moore's Law dead is unacceptable). General performance disadvantages can be adequately addressed with PyPy, JITs, Cython, etc. Problems scaling large code bases using a dynamic language can be mitigated with typing and better tools.
Python can be very competitive against typed systems languages. But if it fails to address its shortcomings, I think more and more people will choose Rust, Go, Java, C/C++, etc for large scale, long time horizon projects. This will [further] relegate Python to be viewed as a "toy" language by more serious developers, which is obviously not good for the Python ecosystem. So I think "maybe you shouldn't use Python for this, then" is a very accurate statement/critique.
I would characterize Python's weaknesses differently.
Startup time is a problem for Python. But concurrency is much more complex than you state: threading is not the only or best concurrency model for many applications. And certainly removing the GIL will not just enable Python "to achieve linear speedups on CPU-bound code". Distributed computing is real. One of Python's problems for a long time was not the GIL, it was the sorry state of multi-process concurrency.
The speed issues that JITs solve for other languages may not be solvable in Python due to language design.
6 replies →
Python's slowness can help improve performance by teaching you to use techniques that end up being faster no matter the language.
Python is so slow that it forces you to be fast.
Consider data analysis: on modern machines, you're almost always better off with a columnar approach: if you have a struct foo { int a, b, c; }, you want to store int foo_a[], foo_b[], foo_c[], not struct foo data[]. It's better for the cache, better for IO, and better for SIMD.
numpy makes it much easier to use the latter than the former, whereas in C, you might be tempted with the former and not even realize how much performance you were leaving on the table. Likewise for GPU compute offloading, reliance on various tuned libraries for computationally intensive tasks, and the use of structured storage.
Sorry, I didn't mean it to be trolling, I just meant it more or less literally. If Rust (for example) gets used for things like Mercurial and Mozilla, is that bad? I'm not saying Python shouldn't care, if it could improve the startup time without sacrificing other things. But presumably the transition from py2 to py3 was not intending to make things slower, it was intending to solve other problems. There are almost always tradeoffs. Even the mercurial folks quoted in the article said that the things py3 solved were not what they needed. That's a good indicator that Python is not the right language (anymore) for what they're doing.
I am primarily a Python programmer, but if Rust, Go, etc. take over as the language of choice in certain cases, I don't think that's a bad thing. Which doesn't mean one shouldn't write an article to highlight this cost of not having short startup time, just in case this cost wasn't understood by Guido, et al. But my guess (and it's only a guess), is that it was.
> While the "maybe you shouldn't use Python" comment could be construed as trolling to some, there is definite truth to your line of reasoning and I agree with comment.
I wouldn't say I construed it as trolling. More like, "You might be right, but where does that get us?" Not trolling, but also not that constructive, because it's extremely easy to write something like "maybe you shouldn't use Python" but likely hard and time-consuming to make it so.
There are a lot of questions when considering such a move. For example:
- What's the opportunity cost of migrating $lots_of Python to Rust, or some other language?
- Is that really where you can add (or want to add) the most value?
- And what does having to do that do to your roadmap? Maybe it enables it, but surely it's also stealing time from other valuable work you could be doing?
- Longer term, are we sacrificing maintainability for performance? (In your case it sounds like the opposite?)
- How easily can we hire and onboard people using $new_tech? (Again, it sounds like you might reduce complexity.)
Basically I suppose what I'm saying is I find it a little trite when people say, "well, maybe you should do X," without having weighed the costs and benefits of doing so. And in a professional environment, if that's allowed to become a pattern of behaviour, it can contribute to the demotivation of teams. Hence, I found myself a bit irritated by the grandparent post.
1 reply →
If one needs Rust, C/C++ level of performance I doubt there is much Python can do and one can wonder if Python was ever the right tool for such a project.
5 replies →
I think your characterization of the GIL is not accurate. Show me ANY real world program that can achieve linear speedups on multicore or multi-processor systems. Humans have not sufficiently mastered multithreading to be able to make such a claim. I am not aware of any "CPU-bound" use cases that would actually use Python like this instead of, say, C or Fortran. And anyway, I submit that it would benefit (both from a design and an execution standpoint) from being multi-process (in other words, using explicitly coded communication).
Regarding the GIL I‘ve always wondered about Jython but never gotten around to trying it. What are the drawbacks of running it on a JVM to get true multithreading? Having to properly sync the threads like in other environments without global locks?
2 replies →
The supposed attitude of the python developers about startup time works against the popular niches Python is supposed to be such a great fit for. Little scripts, glue, short run applications.
That’s a problem if that’s an area python wants to compete in.
I might be biased because I'm from the hordes that are moving from Stata and Matlab to Python (but then there are the hordes attracted to data analysis now), but that was never really Python's strong suit, nor its target market.
I mean, I was always into little scripts, but I used Tcl and then Perl.
11 replies →
The linked post is about Python startup being a problem with thousands of invocations. Is Python startup really a problem for the niches you mention, or is it a problem in some extreme edge cases? I would argue this is the latter and perhaps signals that an architecture change for the build or tests would be best.
I have been using Python for small scripts for 20+ years and haven't had this issue. The JVM on the other hand was historically slow to start.
2 replies →
Yep. Tried to use a Raspberry Pi as my main system for a while and one of the pain points was slooooow startup of Python. As a Python fan I was embarrassed.
I don't particularly agreed about this being what "Python is supposed to be such a great fit for."
I've been to quite a few PyCons and never heard anyone espousing this view, but I'm open to the possibility that I have missed it. Can you link me to a piece of media that you think persuasively makes the case that this is what Python is supposed to be for?
Python is not optimized for small glue code at all. The fact that it is the sanest language for use in that niche speaks much more about the ecosystem than about Python.
Python seems to be mainly optimized for web servers, scientific computing and machine learning tasks. None of those care about startup time.
Python is really only the target for those because someone lied to all of the systems folk and told them that Ruby was too slow. (The previous wave of infrastructure management tools seemed to all be written in Ruby and nowadays it's Python or Go.) That and python is one of the "official" languages at Google and everyone wants to be Google, right?
Meanwhile, Ruby is making great strides in performance and even has JIT coming in 2.6.
2 replies →
I think this is why Mercurial is switching (largely) to Rust: https://www.mercurial-scm.org/wiki/OxidationPlan
I totally understand that milliseconds matter in the use case described in the article.
For me, personally, I use python to automate tasks - or to quickly parse through loads and loads of data. To me, startup speed is somewhat irrelevant.
I built a micro-framework that is completely unorthodox in nature, but very effective for what I needed - that being a suite of tools available from an 'internet' server, available to me (and my coworkers) over port 80 or 443.
My internet server, which runs python on the backend (and uses apache to actually serve the GET / POST) literally spits out pages in 0.012 seconds. Some of the 'tools' run processes on the system, reach out to other resources, and spit the results out in under 0.03 seconds (much of that being network / internet RTT). To me, that's good enough - adding 30 or even 300 milliseconds to any of that just wouldn't matter.
I totally get that if Python wants to be a big (read bigger?) player then startup time matters more...but for my personal use cases, I'm not concerned with the current startup time one bit.
As expected, language start up time only matters to some people. Often in my case, Python is used to build command line tools (similar to the case of Mercurial).
In such an event, the start-up time of the program might dominate the total run time of the application. And on my laptop or desktop with a fast SSD with good caching and a reasonably fast CPU... that still ends up being 'okay'.
But once I put that on an ARM chip with a mediocre hard drive - some python scripts spend so long initializing that they are practically unusable. Whereas the comparable Perl/BASH script runs almost instantaneously.
Often to make Python even practically usable for such systems I have to implement my own lazily loaded module system. Having some language which allowed me to say...
Which could trigger the import process only when that module becomes necessary (if ever).
Have you tried moving import statements into the functions where they are invoked? My understanding is this is effectively the same as lazy loading the module[1].
[1] https://stackoverflow.com/questions/3095071/in-python-what-h...
3 replies →
Python is moving to have a lazy loader as part of the standard library. I mean, it's there already, at https://docs.python.org/3/library/importlib.html#importlib.u... , but not clearly easy to use, and with a big warning label against using it.
The issue at https://bugs.python.org/issue32192 says the plan is to start with an easier to use system as a PyPI package.
I've also written a little asynchronous module loading system. Why not load a module we know we're going to need in the background?
I think you're telling us about how you're not affected by a problem that does affect other people. I feel like this doesn't add any substantial, interesting points to this discussion.
I have similar use cases. Startup time starts to matter once you either want to build test cases or put scripts in loops. If I have a script that parses one big data file, and I decide to parse 1000, it's often helpful if I can run that script a thousand times rather than refactor it to handle file lists. Or if you want to optimize some parameter.
> To me, startup speed is somewhat irrelevant.
But isn’t that the author’s point? It doesn’t seem like much time but because you’re paying it so often in so many little places it really does add up.
Sort of related story: we needed a scripting language able to run on an x86 RTOS type of architecture compiled with msvc and looked into CPython because, well, Python is after all quite a nice language. After spending a considerable amount of time to get it compiled (sorry, don't recall all the issues there, but main one was that the source code assumed msvc == windows which I know is true for 99% of cases but didn't expect a huge project like CPython to trip over) it would segfault at startup. During step-by-step debugging it was astonishing how much code got executed before even doing some actual interpreting/REPL. Now I get there might not be a way around some initialization, but still it simply looked too much to me and perhaps not overly clean either. Moreover it included a bunch of registry access (again, because it saw msvc baing used) which the RTOS didn't have in full hence the segfault. Anyway we looked further and thankfully found MicroPython which took less time to port than the time spend to get CPython even compiling. While not a complete Python implementation, it does the job fur us, and it gets away with startup/init code of just something like 100 LOC (including argument parsing etc). Yes I know it's not a fair comparision, but still, the difference is big enough to, at least for me, indicate CPython might just be doing too much at startup and/or possibly spend time on features which aren't used by many users and/or possibly drags along some old cruft. Not sure, just guessing.
http://boo-lang.org/
Context?
Mercurial's startup time is the reason why, for fish, I've implemented code to figure out if something might be a hg repo myself.
Just calling `hg root` takes 200ms with hot cache. The equivalent code in fish-script takes about 3. Which enables us to turn on hg integration in the prompt by default.
The equivalent `git rev-parse` call takes about 8ms.
Wow, that's quite a difference.
But 8ms is still too slow for me. :) I implemented the Git recognition code myself in my own prompt using the minimal amount of FS operations [1], and it renders in 5 ms from start to finish, including a "git:branch-name/47d72fe825" display.
[1] https://github.com/majewsky/gofu/blob/master/pkg/prompt/git....
(I work on Git in my copious free time)
One of the reasons git-rev-parse takes slightly longer than your implementation is that you just unconditionally truncate the SHA-1 to 10 bytes. E.g. run this on linux.git:
You'll get 4 SHA-1s that are ambiguous at 10 characters, this problem will get a lot worse on bigger repositories.
Which is not to say that there isn't a lot of room for improvement. The scope creep of initialization time is one of the things that tends to get worse over time without being noticed, but Git unlike (apparently) Python makes huge use of re-invoking itself as part of its own test suite (tens of thousands of times), so it's naturally kept in check somewhat.
If you have this use-case I'd encourage you to start a thread on the Git mailing list about it.
I put similar code in Emacs's vc-hg to get revision information straight form Mercurial's on-disk data structures instead of firing up an hg subprocess.
You mean actually reading dirstate[0] or just the branch/bookmark files?
We also do the latter, but dirstate format isn't easily readable just with shell builtins (lots of fixed-length fields with NUL-byte padding, also we don't even have a `stat` builtin and the external program isn't a thing on macOS AFAIK), so we still fire up `hg status` for that - but only after we decide that there is a hg repo.
[0]:https://www.mercurial-scm.org/wiki/DirState
1 reply →
Somewhat tangentially, I noticed that fish performs quit badly in remote-mounted (sshfs) directories that are git repositories. I wonder if it would be possible to detect a remote mounted filesystem and turn off/tone down some of the round-trip heavy operations?
I've gone through your problem myself countless times, and concluded that hitting ctrl+c to interrupt the status line every time it tries to render the current repository state is not very productive.
My git status line uses timelimit (https://devel.ringlet.net/sysutils/timelimit/) to automatically stop if any of the git status parts (dirty/staged/new files) take > 0.1 seconds to finish:
https://github.com/justuswilhelm/pufferfish/blob/master/fish...
I implemented something similar for Xonsh.
Ironically, xonsh itself suffers from a long startup time due to it's use of python. This is my primary (negative) experience with the issue in the linked article, and the reason why I stopped using xonsh.
This is truly a problem. Even more so if you host your application on a network directory. Loading all the small files takes ages. I really wish there would be a good way to compile the whole application with all the modules into one package once you're ready to release. I really wish the creators of Python would have given such use-cases more consideration.
Edit: I'm aware that there are solutions that put everything a program touches into a kind of executable archive. A single file several hundred Megabytes in size. I've tested it. It doesn't really pre-compile the modules. The startup time was exactly the same.
Nuikta (http://nuitka.net/) already does that and much more:
- it compiles your program and make it stand alone so you can distribute just the exe
- it makes it start faster
- it makes it run faster
- it's fully independant of the system python. Actually your system doesn't even need a python at all
I don't get why it's not used, it's very robust, compatible with 3.6 and on some of my script I get about x4 speed up just on start up alone.
This is different from the package that I've tested (PyInstaller or py2exe).
Is Nuikta compatible with numpy, pickle, etc? I remember that numpy was very problematic with compilers like pypy for a long time.
4 replies →
First time I hear about this, and I've looked for alternatives to cxfreeze and its cousins in the past.
Any time I see something like this, I feel like I'm hearing about some homeopathic cancer cure. If Nuitka actually does what it says it does, it's solving a big recurrent problem for the Python community, so why is nobody talking about it?
18 replies →
Does nuitka build a static executable or do you still need to supply shared libraries with the executable?
3 replies →
Nuitka has not been able to compile any Python code I've written myself. It's not used because it's incredibly limited.
1 reply →
Is it a perl2exe descendant, packing the interpreter into an executable wrapper?
1 reply →
I know this isn’t everyone’s favorite but Cython has a way to convert your python code Into an executable with Python embedded and I bekievr it also Packs your imports
Cython is a complicated beast but I feel like it just needs a more friendly wrapper for this to be more widespread.
https://stackoverflow.com/questions/22507592/making-an-execu...
https://github.com/cython/cython/wiki/EmbeddingCython
Why Cython isn’t in the stdlib (I think it could easily replace ctypes) is beyond me sometimes
I worked on one Python application that had a startup time problem because it was on a network filesystem with slow metadata/stat times. It took several seconds to start Python.
We were able to solve most of the problem by zipping up the Python standard library and the our application.
That is, if you look at sys.path you'll see something like:
If you zip up the python3.6 directory into python36.zip then it will use that zip file as the source of the standard library, and use the zip directory structure instead of a bunch of stat calls to find the data.
This should also include getting access to the pre-compiled byte code.
You can also have Python byte-compile all of the .py files in a directory as part of your build/zip process.
Don't forget
right afterwords.
Also note calls to imp.load_source need to change to imp.load_compiled, and any .py files references directly in code need to be changed to .pyc (this is with 2.7, not sure about 3.x)
Thanks, I will try that!
cool!
I think design choices made in Python simply don't allow for comprehensive ahead of time compilation. For what it's worth, they have recently landed snapshots in Dart that do what you want:
https://github.com/dart-lang/sdk/wiki/Snapshots
It's what Flutter uses on iOS since you can't run JITed code; AOT compile it and load it as just another shared library.
Check out subpar: https://github.com/google/subpar
also it’s not static linked, so you need to make sure all of the shared libraries exist on the host, requiring to install a whole bunch of trash.
Here's what has worked for me:
1. Don't do that. Either write the driving app in Python or write the subprocesses in an ahead-of-time compiled language. Python's a great language but it's not the right tool for everything.
2. Be parsimonious with the modules you import. During development, measure the performance after adding new imports. E.g., one graph libraries I tried had all its many graph algorithm implementations separated into modules and it loaded every single one of them even if all you wanted to do was to create a data structure and do some simple operations on it. We just wrote our own minimal class.
> Don't do that. Either write the driving app in Python
Even if you write the driver in Python, you don't necessarily want to call the program you're testing in the same process. You might want independent launches of a command-line tool, so that you test the same behavior people get when they run the tool. Otherwise, your test suite might trip over some internal state that gets preserved from run to run in ways that command-line invocation wouldn't.
Good point, but I didn't mean to sound specific to testing apps. I just meant, in general, write big apps using Python top-down and something precompiled if you must spawn lots of external processes.
I've definitely seen significant improvements with #2. Unfortunately, it's not very Pythonic to tuck your imports into functions (or under conditionals). It would be nice if imports were more lazily evaluated.
The slow startup combined with the general lack of interest of the Python ecosystem to try to find a solution for distributing self contained applications was the biggest reason we ended up writing out CLI tool in something else even though we are a Python shop.
I'm really curious why there hasn't been much of a desire to change this and it even got worse as time progressed which is odd.
One "simple" thing that could be done is to make it easier to build python statically, and improve the freezing toolchain.
When I used to care about the problem, I looked into it (https://cournape.wordpress.com/2015/09/28/distributing-pytho...) and got somewhere. It improves somewhat startup times, and allow distribution of a single binary.
Lots of libraries are terribly slow to import (e.g. requests), but right now there is little incentive to fix it as there is no toolchain to build good, self-contained python CLI apps.
I've written a whole bunch of CLI tools over the years and maintained some I didn't author originally, I always found it annoying how slow these are. A CLI tool for some larger project can easily take a second just to display --help, or a command line parse error. Tests running against it can be made to progress faster (no forking, no separate interpreter, in-process capture etc. which brings a lot of complexity and subtle differences and error persistence/"tainting" of the execution environment), but still you might only get a few CLI invocations per second per core.
Theses experience are a major turn-down from Python for me.
Indeed this is a long-standing issue with Python.
LWN gave some excellent coverage late last year, in this piece:
https://lwn.net/Articles/730915/
Sure there has been desire to change this. It's a hard problem, and there are tradeoffs.
It’s only a hard problem if there is no desire. The slowdowns for the interpreter startup did not happen because they are necessary but because site.py and friends just do more stuff now and a lot of important internal tooling became unnecessarily complex.
2 replies →
> it even got worse as time progressed which is odd.
Quite the contrary, as I stated in my other comment, we now have nuikta.
Parents means got worse in official Python releases.
Not what some fringe tool can or cannot do.
This is disappointing to me too, but I think there are some problems baked in to the language that make it hard.
- Imports can't be parsed statically.
- Startup time has two major components: crawling the file system for imports, and running all the init() functions of every module, which happens before you get to main(). The first is only fixable through breaking changes, and the second is hard to fix without drastically changing the language.
The import code in CPython was a mess, which was apparently cleaned up by importlib in Python 3, through tremendous effort. But unfortunately I think importlib made things slower?
I recall a PyCon talk where as of 3.6, essentially everything about Python 3 is now faster than Python 2, EXCEPT startup time!
This is a shame, because I would have switched to Python 3 for startup time ALONE. (As of now, most of my code and that of my former employer is Python 2.) That would have been the perfect time to address startup time, because getting a 2x-10x improvement (which is what's needed) requires breaking changes.
I don't think there's a lack of interest in the broader Python community, but there might be a lack of interest/manpower in the core team, which leads to the situation wonderfully summarized in the recent xkcd:
https://xkcd.com/1987/
FWIW I was the one who sent a patch to let Python run a .zip file back in 2007 or so, for Python 2.6 I think. This was roughly based on what we did at Google for self-contained applications. A core team member did a cleaner version of my patch, although this meant it was undocumented until Python 3.5 or so:
https://docs.python.org/3/library/zipapp.html
The .zip support at runtime was a start, but it's really the tooling that's a problem. And it's really the language that inhibits tooling.
Also, even if you distributed self-contained applications, the startup time is not great. It's improved a bit because you're "statting" a zip file rather than making syscalls, but it's still not great.
In other words, I have wondered about this "failure" for over a decade myself, and even tried to do something about it. I think the problem is that there are multiple parts to the solution, the responsibility for these parts is distributed. I hate to throw everything on the core team, but module systems and packaging are definitely a case where "distributed innovation" doesn't work. There has to be a central team setting standards that everyone else follows.
Also, it's not a trivial problem. Go is a static language and is doing better in this regard, but still people complain about packaging. (vgo is coming out after nearly a decade, etc.)
I should also add that while I think Python packaging is in the category of "barely works", I would say the same is true of Debian. And Debian is arguably the most popular Linux package manager. They're cases of "failure by success".
> The import code in CPython was a mess, which was apparently cleaned up by importlib in Python 3, through tremendous effort. But unfortunately I think importlib made things slower?
AFAIK importlib is entirely written in Python and kinda portable across Python implementations, while previously most was C code. It's not surprising something gets slower when written in Python.
> Also, even if you distributed self-contained applications, the startup time is not great. It's improved a bit because you're "statting" a zip file rather than making syscalls, but it's still not great.
PyQt applications on Windows typically take two or more seconds before they can do anything, including Enterprise's favourite start-up pastime, splashscreens. Except maybe if you rolled your own .exe wrapper that displayed the splash before invoking any of the Python loading.
That's really, really poor in the age of 4 GHz CPUs from the factory, RAM big enough to fit multiple copies of all binaries on a PC and SSDs with at the very least tens of thousands of IOPS.
3 replies →
EDIT: I should also add that the length of PYTHONPATH as constructed by many package managers is a huge problem. You're doing O(m*n) stat()s -- random disk access -- which is the slowest thing your computer can do.
m is the number of libraries you're importing, and n is the length of the PYTHONPATH.
So it gets really bad, and it's not just one person's "fault". It's a collusion between the Python interpreter's import logic and how package managers use it.
You've characterized the problems well. And yes this is a core problem for python - startup time and import processing is limiting in a lot more cases than just CLI tools. And yes the design of the language makes it hard or possibly impossible to solve.
Also, even if you distributed self-contained applications, the startup time is not great. It's improved a bit because you're "statting" a zip file rather than making syscalls, but it's still not great.
Exactly. There is no silver bullet. The problem is how much code gets run on startup, and how Python's dynamic nature makes traditional startup speedup strategies impossible. Is this even fixable?
4 replies →
Best out of 5 times on my Debian testing laptop for a "hello world", in order of worst to best:
35ms for Python is ok. What we see in reality is that the imports that a real application will use, adds a whole lot more time.
For example, if you want a snappy command line response for a Gtk-using Python program, you probably want to handle command line arguments before even importing Gtk. Maybe it is --help or an argument that you pass on to another running instance, and you want it to be absolutely snappy and fast.
I have read that conditional imports are "un-pythonic", but I tend to do exactly that in order to keep resource usage lower.
Sure, two can play that game. Let's add `-S`, which disables the site module, to the Python invocations.
Gotta love the lie that Ruby is slow.
> C (GCC 7.3): 2ms (int main(void) { puts("hi"); })
Not really a fair comparison given the other 3/4 have to do all their parsing and compiling. Unless in those 2ms you include compilation time. Or use tcc -run.
The user doesn't care, they just invoke "hg" or "git", and language is always a choice, so it's valid from that perspective.
But the reason I included it is because it gives a baseline for the overhead of invoking any program, no matter how trivial.
1 reply →
Given it is known how slow Python at starting up, I am puzzled why Mozilla continue to use it in build scripts. Perl is just as portable but starts up like 10 times faster.
I wrote the linked post and maintain the Firefox build system. The reason is that in 2018 (and for the past 10 years honestly) and it is far easier to find people who know Python than Perl. Python is essentially the lingua franca in Firefox land for systems-level tasks that don't warrant a compiled language. As I said in the post, Rust will likely infringe on Python over time due to performance, lower defect rate, long-term maintenance advantages, etc.
>As I said in the post, Rust will likely infringe on Python over time due to performance, lower defect rate, long-term maintenance advantages, etc.
Indeed. As hg is moving to use more and more Rust:
https://www.mercurial-scm.org/wiki/OxidationPlan
2 replies →
Can't wait for a static, dependency-free hg, but sadly they aren't going to rewrite it entirely in Rust, just the speed sensitive parts. I don't care if it's written in Rust, D, Go or even C. When we picked fossil over hg and git, the Python dependency made us shy away from hg, although the CLI and the overall experience was better than git. Now we also fossil as a deployment tool; it can be a statically compiled drop anywhere binary.
1 reply →
> Rust will likely infringe on Python over time due to performance, lower defect rate, long-term maintenance advantages
No love for Haskell[0]? It does look like the best system scripting language out there right now... I have just never tried it to be sure :(
0: http://www.haskellforall.com/2015/01/use-haskell-for-shell-s...
I imagine there are two aspects to this, they probably started in python and have a lot of it already, and it's probably easier to gen new folks involved which I think is one of their goals.
I've built firefox from source. Python start up time is not really a problem, it's so long to build anything anyway.
I imagine a lot of the pain is for incremental builds where the build system overheads can matter a lot more.
Compile time is absolutely dominated by c++ (and now rust) compilation and linking. I doubt build system language choice will ever bubble up to relevance, so why optimize for it?
Perl is “just as portable” in the same way that a motorcycle can just as easily drive under a steamroller... it’s not gonna be pretty and there’s no easy way out if you do it.
I write Perl scripts for Windows and Linux, and I don't find portability to be especially onerous. Of course there are platform differences to keep in mind, but is that any different from any other cross-platform scripting language?
Build scripts, especially for a C++ app like Firefox, are a place where "slow startup times" are totally irrelevant.
Did you read the link we're discussing? It's in large part about why the slow startup times for Firefox's build scripts are a problem.
> Changing gears, my day job is maintaining Firefox's build system. We use Python heavily in the build system. And again, Python startup overhead is problematic. I don't have numbers offhand, but we invoke likely a few hundred Python processes as part of building Firefox. It should be several thousand. But, we've had to "hack" parts of the build system to "batch" certain build actions in single process invocations in order to avoid Python startup overhead. This undermines the ability of some build tools to formulate a reasonable understanding of the DAG and it causes a bit of pain for build system developers and makes it difficult to achieve "no-op" and fast incremental builds because we're always invoking certain Python processes because we've had to move DAG awareness out of the build backend and into Python. At some point, we'll likely replace Python code with Rust so the build system is more "pure" and easier to maintain and reason about.
4 replies →
Naive question: If the startup time matters because you're imposing that startup time hundreds or thousands of times - why not remove the startup time?
I'm saying, use the emacs model. Start hg with a flag so it simply keeps running in the background while listening on a port. Run a bare-bones nc script to pipe commands to hg over a port and have it execute your commands.
This isn't a new problem, nor is it even a new solution. No complete re-write of the interpreter or the tool required.
Anyways, that's my 2¢
There's a paragraph in the OP about how they've actually done this:
> Mercurial provides a `chg` program that essentially spins up a daemon `hg` process running a "command server" so the `chg` program [written in C - no startup overhead] can dispatch commands to an already-running Python/`hg` process and avoid paying the startup overhead cost. When you run Mercurial's test suite using `chg`, it completes minutes faster. `chg` exists mainly as a workaround for slow startup overhead.
Just like this isn't what the usual `emacs` command does (it's `emacsclient`), it isn't what the usual `hg` command does either. There are some disadvantages to this solution and some assumptions it makes, which have apparently led the Mercurial maintainers to conclude, like the Emacs maintainers, that it won't work as the default. Hence the desire for solutions that will.
I hate to admit it but it's partly why I don't use clojure (pardon the side-topic) more. I can't bear the boot process and the overall cost.
Python is free to tinker, and all similar interpreters are joyful to use. Anything else is probably better for heavy duty jobs environments.
I feel the same way about Clojure. For a LISP, where interactive development via the REPL is supposed to be one of the value-add of the language, it falls completely short in that aspect. They even have entire libraries and design patterns (Component, etc.) to work around the issue, but I find it ridiculous that your entire program structure is dictated by the fact that the REPL boot up time is too damn slow.
It's the main reason I don't use Clojure. I was so excited to learn a modern Lisp. Got an my tools working and wrote my first cli app. Horrendous load time. I realised it's really only suitable for long running processes and I never do that sort of thing so can't use it.
Knock, Knock, who's there? ---- Long Pause --- Java!
Python is great for prototyping or even real apps if performance isn't so critical. However, more than once I've found myself in the situation where I wrote a bunch of Python code and then end up starting that code up from another app, just like the thread discusses and I immediately feel like this is an anti-pattern.
What's even more annoying is that my Python code usually calls a whole lot of C libraries (OpenCV, numpy, etc.) So it's like this: app->OS process->python interpreter->my python code->C libraries. That just really feels wrong so I'd like two things:
1) better/easier path to embed python scripts into my app e.g. resident interpreter
2) some way of passing scripts to python without restarting a new process, this may exist and I'm unaware
Startup time has also been the biggest gripe I have with Julia so far. Otherwise it's a truly fantastic language to work in. I wasn't able to put the `__precompile__()` function to good use it seems - the time it takes to execute my program didn't change at all for some reason. Or maybe it's not actually the startup time that caused the problem, but the time it took to perform file IO. Anyways my program now takes even much longer time to startup than the Python equivalent (though it runs much faster once started), which is a real disappointment.
precompile doesn't store native compiled code. Though I know from talking to the compiler developers that this is high on the 1.x list. It's an annoyance but at least it has a clear solution in sight.
Truly solving this problem is difficult, but you can hack around it with a zygote process to remove a substantial amount of overhead, in exchange for RAM. While this is generally more of win for server processes, you can see it applied to a CLI proof of concept:
https://github.com/msolo/pyzy
I agree Python's startup time is too slow. But one trick you can use to improve it some is the "-S" flag, which skips site-specific customizations. On my Ubuntu system it brings Python 3.6 startup time down from 36ms to 18ms for me; still not great, but it helps.
The drawback is this may screw up your Python environment, not sure how easy it is to work around it if it does.
Proposed solution: steal undump from emacs. https://news.ycombinator.com/item?id=13073566
Perhaps it would be possible to read in the source files, compile them, and preserve an image of the state immediately before reading input or command line.
I'm pretty sure Python 3 already does this, and that's what the __pycache__ directories it creates when running a command are for.
Those are only bytecode. It helps a little but you still have to load the file from the filesystem and run it on import.
I was kind of amazed how penalized a script could be by collecting all its “import” statements at the top. Once somebody’s command couldn’t even print “--help” output in under 2 seconds, and after measuring the script I told them to move all their imports later and the docs appeared instantly.
I'm a long time python user, but never really peeked under the hood. However, I have a few ideas.
Optimized modules loading: maybe loading a larger 'super' module would be faster than several smaller ones? For example a python program could be analyzed to find it's dependent modules, and then pack all these into a 'super' module.
Once the python program executes, it would load the single 'super' module and hopefully bypass all the dynamic code which each module runs when imported to load up.
As mentioned previously, this is just off the top of my head and would certainly warrant more investigation/profiling to confirm my hypothesis.
I'm pretty sure it's too late by now for Python, but I've had some success with compiling C-based interpreters [0] to C; that is, generating the actual C code that the interpreter would execute to run the program. That way you can reuse much of the interpreter, keep the dynamic behavior and still get nimble native executables.
[0] https://github.com/basic-gongfu/cixl#compiling
Should be able to hot boot the VM with the right tooling. You can reuse HPC "checkpoint" code from supercomputing environments as a generic hammer for Python/Ruby/JVM. Some Russians figured out how to do it in userspace without a kernel mod: https://criu.org/Main_Page
People here comment about how python is slow, but even fast/slow is I'll defined in my opinion. You don't see people hacking tensor flow (generally) in native languages to speed it up, they just enable CUDA. I'm imagining fast definition is limited to massively parallel server workloads with io.
Reminds me of buildout. It's awful piece of software. We used in previous Flask project, and a simple flask shell takes 3 minutes to start. If you type `import` in CPython shell it will literally freeze for a few seconds. Because it injects one sys.path for each packages specified!!!
I'm just curious why more people don't make use of chg to avoid the mercurial startup time. It seemed to solve it for me - are there drawbacks?
Isn’t that really a just a bandaid over the real problem though?
The fact the developers of hg went so far as to make that shows startup time is a real issue.
So why not fix the problem at the source?
Indeed, it seems like a perfectly good solution to me. I guess it's something about purity and not being the perfect solution. Wouldn't it be great if python was as fast as a C program that took many times longer to write? Yes, but that would probably be magic.
At a guess: They didn't hear about it (keeping your ears open is a cost not everyone wants to pay). They don't want to bother with setting it up. They don't want to bother with maintaining it (even if it's as simple as reinstall every time you get a new computer).
That's fair. My experience with it so far has literally just been aliasing hg to chg. It performed all the magic in the background for me.
A recent article in ACM Queue included an off-hand remark that Go's compile time is often faster than Python's startup time. Just sayin'
Would it be feasible to keep a set of Python interpreters around at all times and use a round robin approach to feed each already-on interpreter commands then perform an interpreter environment cleanup out-of-band after a task is complete?
The Java ecosystem had this with Drip and I think it turned out to not be a great idea in practice - the magazine of VMs get exhausted when you don't want it to, they get into odd states and other things I think, can't quite remember.
Or just use the operating system's `fork` system call?
There's also nailgun for Java which sounds like it works a little differently: http://martiansoftware.com/nailgun/
I guess a fork()'ed process triggers copy-on-write behavior in the kernel once the process starts running. So that's latency (the copying) you could still optimize away.
4 replies →
Kind of like this?
https://github.com/tbug/aiochannel
I also think David Baezlys curio has a wonderful way of explaining the same concept
http://curio.readthedocs.io/en/latest/reference.html#module-...
Asyncio has enabled his sort of programming a lot easier. You could do the same thing I imagine with multiprocessing and Threadpools I imagine
Yes but with the added complexity and resource usage it's not a good general solution. If every app behaved this way we'd be in a worse place overall.
I imagine this could be handled by some kind of "fork". Where you instantly duplicate the whole process with copy-on-write.
What would be really nice is checkpoint and restart (i.e., unexec), but it turns out that it's extremely hard to implement and get right in a non-managed environment.
Slowness is the elephant in the room in Python land. It's like everybody has decided to cover their eyes in front of this massive pachyderm. A massive delusion
Delusion? I don't think many cover their eyes. More likely they've come to accept that for their use cases the performance is good enough and the convenience gain well worth it.
barsa sempre joder
dsaf
It's weird to see someone make this pitch when C systems software development regularly requires us to try and shave off microseconds. Millisecond delays mean you've already fucked up.
For use cases where performance is important, using an interpreted (implementation of a) language is a bad idea.
There are many great reasons to use Python, but execution speed is not one of them.
Performance and startup performance are really seperate things.
For instance, for many many CLI tasks a python script will be many times faster than a Java tool, just due to the JVM startup. It doesn't really matter if the Java would even run INSTANTLY... the JVM startup time just kills speed for small CLI invocations.
> It doesn't really matter if the Java would even run INSTANTLY... the JVM startup time just kills speed for small CLI invocations
luckily these problems actually do get addressed slowly via AoT/Graal and Substrate VM.
Here comparing a simple hello world program one is written in java and uses Substrate VM to create a binary and compare it with python:
$ time ./hello.py hello world!
real 0m0.041s user 0m0.017s sys 0m0.023s
$ time ./hello.main hello world!
real 0m0.019s user 0m0.008s sys 0m0.010s
of course the comparsion is unfair
Shell is an interpreted language and its startup time is quite fast (5-7 ms on my machine, which is not a particularly fast machine).
In fact, large parts of git were written in shell until they realized that shell is only fast on UNIX because of co-evolution (you can fork without exec, and fork is quite fast), and on other platforms like Windows, existing shell implementations are much slower and there isn't a well-tuned production-ready shell that does things completely differently. Then they started rewriting everything in C.
performance != startup time for lots of applications.
I use a C curses application (dokia) to store some oft-used commands, but anything that won't be run 500x/day or runs longer than ~1/10th of a second I'll write in python for easier/more powerful development
dokia
Google has apparently never heard of this?
1 reply →
Most of this email thread is comparing Mercurial (Python) to Git. I'm not familiar enough with Git's internals to know why and where the languages are split, but it uses a significant amount of Shell scripting and Perl it its code base. You can put 'git-foo' anywhere on PATH and it'll get picked up. So in the comparison they're making startup time doesn't seem to be an issue for a combination of those languages, but it is for Python. It doesn't sound like their problem is that Python is an interpreted language.
Git is actively rewriting many of their shell and Perl code in C. Performance and portability are given as reasons (having shell and Perl as a dependency on Windows is a bit of a kludge). And shell scripts are much slower on Windows because new process overhead on Windows is ~10x what it is on POSIX platforms. (New threads, however, are faster to create on Windows than on POSIX.)
Why shouldn't the interpreted program start faster? Bytecode is usually at least 2x denser than machine code, so all things being equal, when starting an interpreted program, you should be doing less IO, take fewer page faults, and so run faster, at least if you defer computationally-intensive work to specialized AOT-compiled helpers.
That interpreted programs frequently start slower than their compiled equivalents reflects badly on interpreter implementations, not the concept of interpretation itself.
"There are many great reasons to use Python, but execution speed is not one of them."
Um, I think there are lots of examples where using pythons internal data structures as they were designed results in code that is fast enough.
Even though the language implementation is interpreted, lots of common things can be optimized under the hood using data structure and data type specific execution paths and so on.
Maybe you didn't read the post, but in the use cases specified, performance is not important--the author is mostly speaking about deploy and test scripts. Whether these take 1 minute or 10 is not particularly interesting, but you would of course prefer faster if possible. That's the point here--of course a faster Python interpreter is better, and the Python maintainers should place a higher priority on it than the very low priority that they currently do.
But what else do you switch to?
Imagine you import tons of modules which often are only available in Python. This gets you going really quickly with your project and it runs very smoothly. Transferring this to C++ would probably take so long you won't even finish to find out before you run out of funding.
I have hopes that Rust or some descendant of Rust will get us there in maybe 10 years but in the meantime it would be better to get Python up to speed as good as possible.
If Python had a 10th of the funding JS has, we would have start up time, packaging, gui and mobile apps solved by now.
For some use cases, Go might be a good alternative to Python. It's performant, yet simple and readable and it has a great ecosystem.
13 replies →
IIRC CPython devs reject performance-related patches if they cause the code to become "less readable".
>> I believe Mercurial is, finally, slowly porting to Python 3.
I just gave up on Mercurial since it didn't let me push to BitBucket nor to an Ubuntu VPS via SSH.
For better or worse, Git just works.
I'm confused, since my daily workflow is pushing to Bitbucket via hg and ssh.
Imagine my confusion back then.
I could push fine to BitBucket if I used Python 2 version locally, and same for my VPS if I used the Python 2 version both locally and remotely.
But as soon as I touched the Python 3 version of Mercurial the pull/push problems began. I don't recall the exact error and maybe it's fixed now (this happened like 6 months ago), but I don't think I'll give it another try for some time.
My work is considering switching to Git mostly because we think adopting Bitbucket will force us to. Is that not true? I'd love some reasons to stay with hg...
1 reply →