Comment by mhneu
8 years ago
You've characterized the problems well. And yes this is a core problem for python - startup time and import processing is limiting in a lot more cases than just CLI tools. And yes the design of the language makes it hard or possibly impossible to solve.
Also, even if you distributed self-contained applications, the startup time is not great. It's improved a bit because you're "statting" a zip file rather than making syscalls, but it's still not great.
Exactly. There is no silver bullet. The problem is how much code gets run on startup, and how Python's dynamic nature makes traditional startup speedup strategies impossible. Is this even fixable?
I don't think it's fixable in Python unfortunately. As someone else pointed out, the fact that it got WORSE in Python 3, and not better, is a bad sign. Python 3 was the one chance to fix it -- to introduce breaking changes.
As I mentioned, this problem has bugged me for a long time, since at least 2007. Someone else also mentioned the problem with Clojure, and with JIT compilers in general. I'm interested in Clojure too, but my shell-centric workflow is probably one reason I don't use it.
In 2012 I also had the same problem with R, which starts even more slowly than Python. I wrote a command line wrapper that would keep around persistent R processes and communicate with them. I think I can revive some of my old code and solve this problem -- not in Python, but in the shell! Luckily, I'm working on a shell :)
http://www.oilshell.org/
In other words, the solution I have in mind is single-threaded coprocesses, along with a simple protocol to exchange argv, env, the exit code, and stdout/stderr (think CGI or FastCGI). Coprocesses are basically like servers, but they have a single thread to make porting existing apps easy (i.e. turning most command line apps into multi-threaded servers is nontrivial).
If you're interested I might propose something on Zulip. At the very least I want to dig up that old code.
http://www.oilshell.org/blog/2018/04/26.html
I think it's better to solve this problem in shell than Python/R/Ruby/JVM. There's no way all of them will be fixed, so the cleaner solution is to solve it in one place by introducing coprocesses in the shell. I will try to do it in bash without Oil, but it's possible a few things will be easier in Oil.
> introducing coprocesses in the shell
I did this with bash and Python a few years ago when I learned about the "coproc" feature (which, by the way, only supports a single coprocess per bash process, unless I misunderstood it).
But it turns out I tend to open new terminal windows a lot, which meant that the coprocess needs to relaunch all the time anyway, so it wasn't very useful. Even if I start it lazily, to avoid slowing down every shell startup, most of my Python invocations tend to be from a new shell, so there was no real benefit.
Maybe if you have a pool of worker processes that's not tied to any individual shell process, and connect to them with by a Unix-domain socket or something...
Hm yeah I was hoping to do it in a way that's compatible with unmodified bash, but maybe it will only be compatible with Oil to start.
Basically I think there should be a "coprocess protocol" that makes persistent processes look like batch processes, roughly analogous to CGI or FastCGI.
I thought that could be built on top of bash, but perhaps that's not possible.
I'll need to play with it a bit more. I think in bash you can have named multiple coprocesses with their descriptors stored in an array like ${COPROC[@]} or ${MY_COPROC[@]}. But there are definitely issues around process lifetimes, including the ones you point out. Thanks for the feedback.
1 reply →