PyPy deserves much more credit (and much wider use) than it gets. The underperformance of the Faster CPython project [0] shows how difficult it is to optimize a Python implementation, and highlights just how impressive PyPy really is.
> The article says "Python has gotten nearly 50% faster in less than four years", but the original goal was a 5x speedup in the same timeframe
IIRC they originally expected the JIT to be the single focus on CPython performance improvement. But then another front was opened to tackle the GIL in parallel[1]. Perhaps the overhead of two major "surgeries" in the CPython codebase at the same time contributed to slower progress than originally predicted.
But what do people actually use Python for the most, at least as far as industry is concerned? Interfacing with those C extensions.
PyPy does have an alternative ABI that integrates with the JIT and also works on CPython, so if people cared that much about those remaining bits of performance, they could support it.
I really wish PSF would adopt PyPy as a separate project. It is so underrated. People still think it supports a subset of Python code and that it is slow with C ffi code
But the latest PyPy supports all of Python 3.12 and it is just as fast with C ffi code as JIT Python code. It is literally magic and if it was more popular Python would not have a reputation for being slow.
PyPy is amazing and it's actually a bit baffling that it's not the default that everyone is using in production. I've had Python jobs go from taking hours to run, down to minutes simply by switching to PyPy.
Yes. I've had a small webapp running under it quite happily (complete overkill, but it's a personal project and I was curious).
Very basic hello world app hosted under gunicorn (just returning the string "hello world", so hopefully this is measuring the framework time).
Siege set to do 10k requests, 25 concurrency, running that twice so that they each have a chance to "warm up", the second round (warmed up) results give me:
Unfortunately it keeps being the black swan in the Python community.
Python is probably the only programming language community that has been so much against JITs, and where folks routinely call C libraries bindings, "Python".
It's not a black swan. The issue is that using Pypy means accepting some potential compatibility hassle, and in return you get a reasonable speedup in your Python code, from glacial to tolerable. But nobody who has accepted glacial speed really needs tolerable speed.
It's like... imagine you ride a bike to most places. But now you want to visit Australia. "No problem, here take this racing bike! It's only a little less comfortable!".
So really it's only of interest to people who have foolishly built their entire business on Python and don't have a choice. The only one I know of is Dropbox. I bet they use Pypy.
By the time they switch ton
pypy they already have too many C Extensions that is not
compatible with pypy at that time and instead of improving pypy they try to develop their own llvm based jit python and they failed doing that. They should had ported those into CFFI or just help pypy improve context support. But NIH much and they built their own pypy alternative for years and failed
Because Pypy wasn't even _mentioned_ in the JIT PEP (https://peps.python.org/pep-0744/), like it's the black sheep the family isn't supposed to talk about.
Because it is one of the most ambitious project in opensource world and very little is known about that. It is neglected by Python Contributor community for unknown reasons ( something political it seems) . It was developed as PHD Research project by really good researchers.
PyPy had written python in Pure python and surpassed performance of Python written in C by 4-20x . They delivered Python with JIT and also Static RPython : which is subset of python which compiles directly to binary.
I had also personally worked together with some of the lead PyPy developers on commercial projects and they are the best developers to work together with.
Back in 2022 it worked fine with literally all modules except some ssh, ssl and C based modules.
With a little bit of tinkering (multiprocessing, choosing the right libraries written strictly in python, PyPy plus a lot of memory) I was able to optimize some workflows going from 24h to just 17 minutes :) Good times...
The "C based modules" bit is the kicker. A significant chunk of Python users essentially use it as a friendly wrapper for more-powerful C/C++ libraries underneath the hood.
Yep, I had a script that was doing some dict mapping and re-indexing, wrote the high level code to be as optimal as possible, and switching from cpython to pypy brought the run time from 5 minutes to 15 seconds.
Not a subset. It covers 100% of pure python. CPyExt are working fine , just need optimizations on some parts.
The private CPyEXT calls that some libraries uses as Hacks are only things that PyPy do not support officially (PyO3 Rust-python bindings uses those) .
PyPy deserves much more credit (and much wider use) than it gets. The underperformance of the Faster CPython project [0] shows how difficult it is to optimize a Python implementation, and highlights just how impressive PyPy really is.
[0] The article says "Python has gotten nearly 50% faster in less than four years", but the original goal was a 5x speedup in the same timeframe [https://github.com/markshannon/faster-cpython/blob/master/pl...].
> The article says "Python has gotten nearly 50% faster in less than four years", but the original goal was a 5x speedup in the same timeframe
IIRC they originally expected the JIT to be the single focus on CPython performance improvement. But then another front was opened to tackle the GIL in parallel[1]. Perhaps the overhead of two major "surgeries" in the CPython codebase at the same time contributed to slower progress than originally predicted.
[1] https://peps.python.org/pep-0703/
The main culprit is not wanting to change the C ABI of the VM.
Other equally dynamic languages have long shown the way.
But what do people actually use Python for the most, at least as far as industry is concerned? Interfacing with those C extensions.
PyPy does have an alternative ABI that integrates with the JIT and also works on CPython, so if people cared that much about those remaining bits of performance, they could support it.
1 reply →
I really wish PSF would adopt PyPy as a separate project. It is so underrated. People still think it supports a subset of Python code and that it is slow with C ffi code
But the latest PyPy supports all of Python 3.12 and it is just as fast with C ffi code as JIT Python code. It is literally magic and if it was more popular Python would not have a reputation for being slow.
PyPy is amazing and it's actually a bit baffling that it's not the default that everyone is using in production. I've had Python jobs go from taking hours to run, down to minutes simply by switching to PyPy.
Do you happen to know if Flask is supported by any chance?
Yes. I've had a small webapp running under it quite happily (complete overkill, but it's a personal project and I was curious).
Very basic hello world app hosted under gunicorn (just returning the string "hello world", so hopefully this is measuring the framework time). Siege set to do 10k requests, 25 concurrency, running that twice so that they each have a chance to "warm up", the second round (warmed up) results give me:
So it seems like there's definitely things that pypy's JIT can do to speed up the Flask underpinnings.
Yes, have been using Flask on PyPy3 for years. I get about a 4x speedup.
I just tested it and it works perfectly.
Unfortunately it keeps being the black swan in the Python community.
Python is probably the only programming language community that has been so much against JITs, and where folks routinely call C libraries bindings, "Python".
It's not a black swan. The issue is that using Pypy means accepting some potential compatibility hassle, and in return you get a reasonable speedup in your Python code, from glacial to tolerable. But nobody who has accepted glacial speed really needs tolerable speed.
It's like... imagine you ride a bike to most places. But now you want to visit Australia. "No problem, here take this racing bike! It's only a little less comfortable!".
So really it's only of interest to people who have foolishly built their entire business on Python and don't have a choice. The only one I know of is Dropbox. I bet they use Pypy.
By the time they switch ton pypy they already have too many C Extensions that is not compatible with pypy at that time and instead of improving pypy they try to develop their own llvm based jit python and they failed doing that. They should had ported those into CFFI or just help pypy improve context support. But NIH much and they built their own pypy alternative for years and failed
I don't get why PyPy and CPython don't simply merge. It will be difficult, organization wise... but not impossible.
When people think of C library wrappers as Python is kind of an hard sell.
HPY is new alternative, it works at same performance with cpyext and the same with pypy
Why do people feel the need to comment this on every single JIT post? Like imagine commenting on every post about Pepsi "Coca-cola exists since 1886".
Because Pypy wasn't even _mentioned_ in the JIT PEP (https://peps.python.org/pep-0744/), like it's the black sheep the family isn't supposed to talk about.
Because as proven multiple times, the problem isn't Python, rather CPython, and many folks keep mixing languages with implementations.
Because it is one of the most ambitious project in opensource world and very little is known about that. It is neglected by Python Contributor community for unknown reasons ( something political it seems) . It was developed as PHD Research project by really good researchers. PyPy had written python in Pure python and surpassed performance of Python written in C by 4-20x . They delivered Python with JIT and also Static RPython : which is subset of python which compiles directly to binary. I had also personally worked together with some of the lead PyPy developers on commercial projects and they are the best developers to work together with.
> PHD
Do you know that it's PhD because the h is part of word philosophy?
1 reply →
If memory serves, PyPy supports a subset of Python and focused their optimizations on software transactional memory.
Back in 2022 it worked fine with literally all modules except some ssh, ssl and C based modules.
With a little bit of tinkering (multiprocessing, choosing the right libraries written strictly in python, PyPy plus a lot of memory) I was able to optimize some workflows going from 24h to just 17 minutes :) Good times...
It felt like magic.
The "C based modules" bit is the kicker. A significant chunk of Python users essentially use it as a friendly wrapper for more-powerful C/C++ libraries underneath the hood.
3 replies →
Yep, I had a script that was doing some dict mapping and re-indexing, wrote the high level code to be as optimal as possible, and switching from cpython to pypy brought the run time from 5 minutes to 15 seconds.
If pypy worked with Retux the game would get a big boost. Altough the main issue is that it tried to redraw many object at one per frame.
Not a subset. It covers 100% of pure python. CPyExt are working fine , just need optimizations on some parts. The private CPyEXT calls that some libraries uses as Hacks are only things that PyPy do not support officially (PyO3 Rust-python bindings uses those) .