Comment by mhneu

8 years ago

I would characterize Python's weaknesses differently.

Startup time is a problem for Python. But concurrency is much more complex than you state: threading is not the only or best concurrency model for many applications. And certainly removing the GIL will not just enable Python "to achieve linear speedups on CPU-bound code". Distributed computing is real. One of Python's problems for a long time was not the GIL, it was the sorry state of multi-process concurrency.

The speed issues that JITs solve for other languages may not be solvable in Python due to language design.

I'm totally OK with Python's threading choice of saying only 1 Python thread may execute Python code at any time. This is a totally reasonable choice and avoids a lot of complexity with multithreaded programming. If that's how they want to design the language, fine by me.

But the GIL is more than that: the GIL also spans interpreters (that's why it's called the "global interpreter lock").

It is possible to run multiple Python interpreters in a single process (when using the embedding/C API). However, the GIL must be acquired for each interpreter to run Python code. This means that I can only effectively use a single CPU core from a single process with the GIL held (ignoring C extensions that release the GIL). This effectively forces the concurrency model to be multiple process. That makes IPC (usually serialization/deserialization) the bottleneck for many workloads.

If the GIL didn't exist, it would be possible to run multiple, independent Python interpreters in the same process. Processes would be able to fan out to multiple CPU cores. I imagine some enterprising people would then devise a way to transfer objects between interpreters (probably under very well-defined scenarios). This would allow a Python application to spawn a new Python interpreter from within Python, task it with running some CPU-expensive code, and return a result. This is how Python would likely achieve highly concurrent execution within processes. But the GIL stands in its way.

The GIL is an implementation detail, not poor language design.

  • It is a tractable amount of work ~40-80 hrs to convert CPython from a sea-of-globals to a context based system where one could then have a distinct Python interpreters in the same address space, as it is now. You get one. Lua got this right from the beginning, Lua state doesn't leak across subsystems. There is zero chance I would do this work and then see of it would stick. I am going to waste 2 weeks of full time work and then have the CPython folks say, yeah, no, because reasons.

    Startup time should be fixed, Python does way too much when it boots, using blank files.

        $ time lua t.lua 
    
        real	0m0.006s
        user	0m0.002s
        sys	        0m0.002s
    
        $ time python t.py 
    
        real	0m0.052s
        user	0m0.036s
        sys	        0m0.008s

  • > The GIL is an implementation detail, not poor language design.

    As I understood the GIL simplifies data structures by removing any regard for concurrent access.

    If you remove the GIL you must move your synchronization (mutexes) into the data structures and immediately get a big performance penalty.

    If you wanted to avoid this overhead you run into swamplands where the programmer must take care of concurrent access patterns and everything. Also many CPython modules would stop working because they assume the GIL.

    It can be done but last time I read about the GILectomy there was no clear way forward.

    • Yeah, I think this kind of issue is why Ruby, which also has a GIL, seems to be heading for a new concurrency and parallelism model that introduces a new level (Guilds) between threads and processes where the big lock would be held, and where Guilds communicate only by sharing read access to immutable data, and transferring ownership or copies of mutable data.

  • I agree that this is an implementation detail. If they were to simply use the JS model of "every thread gets its own environment and message passing is how you interact", then you could still use threads safely and achieve some pretty impressive performance improvements in some cases.

    Knowing literally nothing about Python other than what I read, I'm kind of confused as to how the current implementation came to be, because it is much easier to design an interpreter that uses the JS model than one that uses a shared environment among multiple threads. I created an Object Pascal interpreter, and it has this design: it can spin up an interpreter instance in any thread pretty quickly because it's greenfield all the way with a new stack, new heap, etc.