← Back to context

Comment by std_throwaway

8 years ago

The GIL is a pretty nasty problem once you try to scale things beyond one core.

Simply try something like unpickling a 10 GB data structure while keeping your GUI in the main thread responsive. You cannot do that because the GIL locks up everything while modifying data structures. Move the data to another process instead of another thread. Great, your GUI is responsive but you can't access the data from the main thread.

You can say that such a humongous data structure is wrong or that a GUI isn't meant to be responsive or programmed in Python or that I'm holding it wrong. Probably right.

I've flailed around with this a few times in the last year or so and have found that posting things up and down a multiprocessing.Pipe is the least painful alternative.

  • So you're basically building a distributed application just because you can't share memory properly. This can be very efficient if little communication is involved or a total nightmare if you have gigabytes of data where you need lots of random read access to walk the data structures at high speed. If you're not careful you spend most of your time pickling and unpickling the stuff you send over your pipes while requiring duplication of your gigabyte data structures in order to gain at least some parallelism.

    I don't see a way around this mess with the current structure of python. You would have to reimplement the data heavy part completely in another language that provides proper threading models.

"You're holding it wrong" is a poor response to a wide audience, like iPhone users. But it's an OK response to a specialist, like someone tackling the task you describe.

  • I'm a professional Python developer and I run into performance problems a lot. Python makes things really hard for even specialists to "hold right". Contrast that with Go, which (for all the hate it gets) writes very alike well-formed Python in single-threaded applications, and writes how you would like to write Python in parallel applications. And all the while being two orders of magnitude faster. If we don't start taking performance seriously in the Python community, Go (or someone else) will eat our lunch sooner or later.

    • Go offers faster performance with code that is up to 50% longer - with the commensurate added maintenance burden.

      And, go is still very slow compared to C, C++ or Rust.

      Since performance is usually a power law distribution (99% of the performance gains are made in 1% of the code), it's frequently more effective - in terms of speed and maintenance burden - to code up hot paths in a language like C, C++ or Rust and keep python.

      1 reply →

    • Have you watched David Beazley's talks about using generators to implement coroutines? That might give you a similar pattern to goroutines. If non-blocking IO isn't the challenge, do you make use of the concurrent.futures module?

      While I also encounter efficiency issues, most of them are frustrations with the overhead of serialization in some distributed compute framework or the throughput of someone else's REST API. As much as so many people complain about the GIL, it's never been a blocker for me (pun intended). Perhaps it's because my style in Python is heavily influenced by Clojure.

      Now that I think about it, Python's string processing is often my bottleneck.

      4 replies →