Comment by pansa2

15 days ago

PyPy is a fantastic achievement and deserves far more support than it gets. Microsoft’s “Faster CPython” team tried to make Python 5x faster but only achieved ~1.5x in four years - meanwhile PyPy has been running at over 5x faster for decades.

On the other hand, I always got the impression that the main goal of PyPy is to be a research project (on meta-tracing, STM etc) rather than a replacement for CPython in production.

Maybe that, plus the core Python team’s indifference towards non-CPython implementations, is why it doesn’t get the recognition it deserves.

Third party libraries like SciPy scikit-learn, pandas, tensorflow and pytorch have been critical to python’s success. Since CPython is written in C and exposes a nice C API, those libraries can leverage it to quickly move from (slow) python to (fast) C/C++, hitting an optimum between speed of development and speed of runtime.

PyPy’s alternative, CFFI, was not attractive enough for the big players to adopt. And HPy, another alternative that would have played better with Cython and friends came too late in the game, by that time PyPy development had lost momentum.

  • PyPy on numpy heavy code is often a lot slower than CPython

    • Yes. The C API those libraries use is a good fit to CPython, a bad fit to PyPy. Hence CFFI and HPy. Actually, many if the lessons from HPy are making their way into CPython since their JIT and speedups face the same problems as PyPy. See https://github.com/py-ni

  • I rather like Python and have used the C API extensively, "nice" is not the word I'd choose ...

  • Sorry can you explain more the connection between PyPy and CFFI (which generates compiled extension modules to wrap an existing C library)? I have never used PyPy, but I use CFFI all the time (to wrap C libraries unrelated to Python so that I can use them from Python)

    • CFFI is fast on PyPy. The JIT still cannot peer into the compiled C/C++ code, but it can generate efficient interface code since there is a dedicated _cffi_backend module built into PyPy. Originally that was the motivation for the PyPy developers to create CFFI.

      1 reply →

  • Python was already widely deployed before them, thanks to Zope, and being a saner alternative to Perl.

The Faster Python project would’ve got further if Microsoft hadn’t let the entire team go when they made large numbers of their programming languages teams redundant last year. All in the name of “AI”. Microsoft basically gave up on core computer science to go chase the hype wave.

  • You’re right, of course: even Guido seems to have been moved off working on CPython and onto some tangentially-related AI technology.

    However, Faster CPython was supposed be a 4-year project, delivering a 1.5x speedup each year. AFAIK they had the full 4 years at Microsoft, and only achieved what they originally planned to do in 1 year.

    • To be fair, they suffered a bit from scope creep, as mid project it was started a second major effort to remove the gil. So the codebase was undergoing two major surgeries at the same time. Hard to believe they could stick to the original schedule under those conditions. Also gil removal decreases performance from sequential execution. I imagine some gains from Faster CPython were/will be spent compensating this hit on gil-less single thread performance.

We have been using PyPy on core system component on production for like 10 years

> PyPy is a fantastic achievement and deserves far more support than it gets

PyPy is a toy for getting great numbers in benchmarks and demos, is incompatible in a zillion critical ways, and is basically useless for large-scale development for anything that has to interoperate with "real" Python.

Literally everyone who's ever tried it has the experience that you mock up a trial for your performance code, drop your jaw in amazement, and then run your whole app and it fails. Until there's a serious attempt at real 100% compatibility, none of this is going to change.

Also none of the deltas are well-documented. My personal journey with PyPy hit a wall when I realized that it's GC is lazy instead of greedy. So a loop that relies on the interpreter to free stuff up (e.g. file descriptors needing to be closed) rapidly runs into resource exhaustion in PyPy. This is huge, easy to trip over, extremely hard to audit, and... it's like it's hidden lore or something. No one tells you this, when it needs to be at the top of their front page before your start the port.

  • "Ask HN: Is anyone using PyPy for real work?" from 2023 contradicts you about PyPy being a toy. The replies are noticeably biased towards batch jobs (data analysis, ETL, CI), where GC and any other issues affecting long-running processes are less likely to bite, but a few replies talk about sped-up servers as well.

    https://news.ycombinator.com/item?id=36940871 (573 points, 181 comments)

  • Timely management of external resources is what the `with` statement has been for since 2006, added in python 2.5 or so. To debug these problems Python has Resource Warnings.

    Additionally, CPython's gc is also only eager in a best effort kind of way. If cycles are involved it can take long to release memory. This will become even more the case in future versions of CPython, in the free threading variants.

    • Sorry, the with statement is non-responsive. The question isn't whether you "can" write PyPy-friendly code. Obviously you can.

      The question isn't even whether or not you "should" write PyPy-friendly code, it's whether YOU DID, or your predecessors did. And the answer is "No, they didn't". I mean, duh, as it were.

      PyPy isn't compatible. In this way and a thousand tiny others. It's not really "Python" in a measurable and important way. And projects that are making new decisions for what to pick as an implementation language for the evolution of their Python code have, let's be blunt, much better options than PyPy anyway.

      4 replies →