Comment by weberc2

8 years ago

Coroutines aren’t parallelization, so they’re quite a lot worse than goroutines in terms of performance. If you want parallelism in Python, you’re pretty much constrained to clumsy multiprocessing. Besides parallelism, Python makes it difficult to write efficient single threaded code, since all data is sprinkled around the heap, everything is garbage collected, and you can’t do anything without digging into a hashmap and calling a dozen C functions under the hood. And you can’t do much about these things except write in C, and that can even make things slower if you aren’t careful.

Probably the best thing you can do in Python is async io, and even this is clumsier and slower than in Go. :(

I'm getting confused. Are you trying to do parallel compute or parallel networking?

If parallel networking, the benchmarks I've seen set Python asynchronous IO at about the same speed as Golang. The folks at Magicstack reported that Python's bottleneck was parsing HTTP (https://magic.io/blog/uvloop-blazing-fast-python-networking/). Note their uvloop benchmark was about as fast or faster than the equivalent Golang code.

If parallel compute, then multiprocessing is the way to go and Python's futures module ain't clumsy. It's just ``pool.submit(func)`` or ``pool.map(func, sequence)``. If you're asking for parallel compute via multithreading, you're going against the wisdom of shared-nothing architecture. Besides, pretty soon you'll want to go distributed and won't be able to use threads anyway.

In contrast to your experience, I find Python makes it easy to write efficient code. Getting rid of the irrelevant details lets me focus on clear and efficient algorithms. When I need heavy compute, I sprinkle in a little NumPy or Numba. My bottleneck is (de)serialization, but Dask using Apache Arrow should solve that problem.

  • > I'm getting confused. Are you trying to do parallel compute or parallel networking?

    Parallelism conventionally means "parallel computation". For async workloads, you're right--there are third party event loops that approach Go's performance, but that's not the subject of my complaint.

    Regarding parallelism, I haven't used Python's futures module specifically, but all multiprocessing solutions are bad for data-heavy workloads simply because the time to marshal the data structure across the process boundary poses a severe penalty. There are many other disadvantages to processes as well--they're far less memory friendly than a goroutine (N Python interpreters running, each with the necessary imports loaded), they require extra support to get logging to work as expected (you have to make sure to pipe stderr and stdout), they're subject to the operating system's scheduler, which may kill them on a whim.

    > Besides, pretty soon you'll want to go distributed and won't be able to use threads anyway.

    Processes have the same problem in addition to being generally less efficient.

    > you're going against the wisdom of shared-nothing architecture

    I mean, sort of. If you're doing a parallel computation on a large immutable data structure, you don't lose out on maintainability, but you gain quite a lot of performance (no need to copy/marshal that structure across process boundaries). The loss of maintainability is negligible due to immutability. Besides, there are lots of other good reasons to share things across processes, like connection pools, file handles, and other resources.

    Also, it's terribly ironic that you're defending CPython and specifically its GIL on the basis of "shared nothing architecture".

    > In contrast to your experience, I find Python makes it easy to write efficient code. Getting rid of the irrelevant details lets me focus on clear and efficient algorithms.

    Then you'll love Go--Go has far fewer irrelevant details than Python and Python lacks many _relevant_ details, such as control over memory. Your efficient algorithm in Python will almost certainly be at least two or orders of memory better than the equivalent CPython without compromising much in terms of readability.

    > When I need heavy compute, I sprinkle in a little NumPy or Numba.

    I haven't used Numba, but I've seen a lot of Python get _slower_ with NumPy and Pandas (and lots of other C extensions, for that matter). You have to know your problem well or you'll end up with code that is less readable and less performant than the original, and even when it works it's still less readable than the naive Go implementation and not significantly more performant.

    > My bottleneck is (de)serialization, but Dask using Apache Arrow should solve that problem

    They'll help, but the fact that Python needs these projects when other languages have far simpler solutions is an admission of guilt in my view. That said, I'm excited to see what sorts of things these projects enable in the Python community.

    • > Processes have the same problem in addition to being generally less efficient.

      What I meant was that you should consider a multiprocessing approach that shares essentially no data between processes. As you say, the memory copying overhead is highly inefficient. Once you approach a problem like that, you've already implemented an essentially distributed system and the change is trivial.

      I've regretted multithreading enough times to convince me it's almost never the right choice. Mostly because I find I've underestimated the project scale and needed to rewrite as distributed. Maybe those new monstrous instances available on EC2 will change my habits. I've never had such flexible access to a 4TB RAM / 128 core machine before.

      > the fact that Python needs these projects

      Apache Arrow solves problems for many languages. The ACM article that popped up the other day, "C is not low-level" touched on some of the issues.

      https://arrow.apache.org