Python 3.14 is here. How fast is it?

5 days ago (blog.miguelgrinberg.com)

Tangential, but I practically owe my life to this guy. He wrote the flask mega tutorial in what I followed religiously to launch my first website. Then right before launch, in the most critical part of my entire application; piping a fragged file in flask. He answered my stackoverflow question, I put his fix live, and the site went viral. Here's the link for posterity's sake https://stackoverflow.com/a/34391304/4180276

Please don’t make benchmarks with timing inside the loop creating a sum. Just time the loop and divide by the number. Stuff happens getting the time and the jitter can mess with results.

Every time I hear news about Python language itself, it sadden me that, in 2025, PyPy is still a separate distinct track from mainline Python.

That said, I wonder if GIL-less Python will one day enable GIL-less C FFI? That would be a big win that Python needs.

  • The biggest thing PyPy adds is JIT compilation. This is precisely what the project to add JIT to CPython is working on these days. It's still early days for the project, but by 3.15 there's a good chance we'll see some really great speedups in some cases.

    It's worth noting that PyPy devs are in the loop, and their insights so far have been invaluable.

  • > That said, I wonder if GIL-less Python will one day enable GIL-less C FFI?

    What do you mean exactly? C FFI has always been able to release the GIL manually.

  • > That said, I wonder if GIL-less Python will one day enable GIL-less C FFI? That would be a big win that Python needs.

    I'm pretty sure that is what freethreading is today? That is why it can't be enabled by default AFAIK, as several C FFI libs haven't gone "GIL-less" yet.

  • Can you clarify the concern? Starting from C I've come to expect many dialects across many compiler implementations. It seems healthy and encourages experimentation. Is it not a sign of a health language ecosystem?

    Pypy compatibility with cpython seems very minor in comparison https://pypy.org/compat.html

  • How do you see that changing?

    Python introduce another breaking change than also randomly affects performance, making it worse for large classes of users?

    Why would the Python organisers want to do that?

  • I don't understand why C FFI is that popular.

    The amount of time it takes spent to write all the cffi stuff is the same amount it takes to write an executable in C and call it from python.

    The only time cffi is useful is if you want to have that code be dynamic, which is a very niche use case.

    • You write the ffi once and let hundreds or thousands of other developers use it. For one off executables it rarely make sense.

      Mixing the use with other libraries provided by the Python ecosystem is a another scenario. Do you really want to do HTTP in C or do you prefer requests?

I hope it doesn't get stuck at 3.14, like TeX.

https://www.reddit.com/r/RedditDayOf/comments/7we430/donald_...

  • You hope it doesn't?

    > [Donald Knuth] firmly believes that having an unchanged system that will produce the same output now and in the future is more important than introducing new features

    This is such a breath of fresh air in a world where everything is considered obsolete after like 3 years. Our industry has a disease, an insatiable hunger for newness over completeness or correctness.

    There's no reason we can't be writing code that lasts 100 years. Code is just math. Imagine having this attitude with math: "LOL loser you still use polynomials!? Weren't those invented like thousands of years ago? LOL dude get with the times, everyone uses Equately for their equations now. It was made by 3 interns at Facebook, so it's pretty much the new hotness." No, I don't think I will use "Equately", I think I'll stick to the tried-and-true idea that has been around for 3000 years.

    Forget new versions of everything all the time. The people who can write code that doesn't need to change might be the only people who are really contributing to this industry.

    • > There's no reason we can't be writing code that lasts 100 years. Code is just math.

      In theory, yes. In practice, no, because code is not just math, it's math written in a language with an implementation designed to target specific computing hardware, and computing hardware keeps changing. You could have the complete source code of software written 70 years ago, and at best you would need to write new code to emulate the hardware, and at worst you're SOL.

      Software will only stop rotting when hardware stops changing, forever. Programs that refuse to update to take advantage of new hardware are killed by programs that do.

      33 replies →

    • Are you by chance a Common Lisp developer? If not, you may like it (well, judging only by your praise of stability).

      Completely sidestepping any debate about the language design, ease of use, quality of the standard library, size of community, etc... one of its strengths these days is that standard code basically remains functional "indefinitely", since the standard is effectively frozen. Of course, this requires implementation support, but there are lots of actively maintained and even newer options popping up.

      And because extensibility is baked into the standard, the language (or its usage) can "evolve" through libraries in a backwards compatible way, at least a little more so than many other languages (e.g. syntax and object system extension; notable example: Coalton).

      Of course there are caveats (like true, performant async programming) and it seems to be a fairly polarizing language in both directions; "best thing since sliced bread!" and "how massively overrated and annoying to use!". But it seems to fit your description decently at least among the software I use or know of.

      2 replies →

    • Stability is for sure a very seducing trait. Also I can totally understand the fatigue of the chase for the next almost already obsolete new stuff.

      >There's no reason we can't be writing code that lasts 100 years.

      There are many reason this is most likely not going to happen. Code despite best effort to achieve separation of concern (in the best case) is a highly contextual piece of work. Even with a simple program with no external library, there is a full compiler/interpreter ecosystem that forms a huge dependency. And hardware platforms they abstract from are also moving target. Change is the only constant, as we say.

      >Imagine having this attitude with math: "LOL loser you still use polynomials!? Weren't those invented like thousands of years ago?

      Well, that might surprise you, but no, they weren't. At least, they were not dealt with as they are thought and understood today in their contemporary most common presentation. When Babylonians (c. 2000 BCE) solved quadratic equation, they didn't have anything near Descartes algebraic notation connected to geometry, and there is a long series evolution in between, and still to this days.

      Mathematicians actually do make a lot of fancy innovative things all the time. Some fundamentals stay stable over millennia, yes. But also some problem stay unsolved for millennia until some outrageous move is done out of the standard.

      2 replies →

    • To be fair, if math did have version numbers, we could abandon a lot of hideous notational cruft / symbol overloading, and use tau instead of pi. Math notation is arguably considerably worse than perl -- can you imagine if perl practically required a convention of single-letter variable names everywhere? What modern language designer would make it so placing two variable names right next to each other denotes multiplication? Sheer insanity.

      Consider how vastly more accessible programming has become from 1950 until the present. Imagine if math had undergone a similar transition.

      11 replies →

    • > There's no reason we can't be writing code that lasts 100 years. Code is just math

      Math is continually updated, clarified and rewritten. 100 years ago was before the Bourbaki group.

      1 reply →

    • > There's no reason we can't be writing code that lasts 100 years. Code is just math. Imagine having this attitude with math: "LOL loser you still use polynomials!? Weren't those invented like thousands of years ago? LOL dude get with the times, everyone uses Equately for their equations now. It was made by 3 interns at Facebook, so it's pretty much the new hotness." No, I don't think I will use "Equately", I think I'll stick to the tried-and-true idea that has been around for 3000 years.

      Not sure this is the best example. Mathematical notation evolved a lot in the last thousand years. We're not using roman numerals anymore, and the invention of 0 or of the equal sign were incredible new features.

      4 replies →

    • > an insatiable hunger for newness over completeness or correctness.

      I understand some of your frustration, but often the newness is in response to a need for completeness or correctness. "As we've explored how to use the system, we've found some parts were missing/bad and would be better with [new thing]". That's certainly what's happening with Python.

      It's like the Incompleteness Theorem, but applied to software systems.

      It takes a strong will to say "no, the system is Done, warts and missing pieces and all. Deal With It". Everyone who's had to deal with TeX at any serious level can point to the downsides of that.

    • If you look at old math treatises from important historical people you'll notice that they use very different notation from the one you're used to. Commonly concepts are also different, because those we use are derived over centuries from material produced without them and in a context where it was traditional to use other concepts to suss out conclusions.

      But you have a point, and it's not just "our industry", it's society at large that has abandoned the old in favour of incessant forgetfulness and distaste for tradition and history. I'm by no means a nostalgic but I still mourn the harsh disjoint between contemporary human discourse and historical. Some nerds still read Homer and Cicero and Goethe and Ovid and so on but if you use a trope from any of those that would have been easily recognisable as such by europeans for much of the last millenium you can be quite sure that it won't generally be recognised today.

      This also means that a lot of early and mid-modern literature is partially unavailable to contemporary people, because it was traditional to implicitly use much older motifs and riff on them when writing novels and making arguments, and unless you're aware of that older material you'll miss out on it. For e.g. Don Quixote most would need an annotated version which points out and makes explicit all the references and riffing, basically destroying the jokes by explaining them upfront.

    • Worth noting that few people use the TeX executable as specified by Knuth. Even putting aside the shift to pdf instead of dvi output, LaTeX requires an extended TeX executable with features not part of the Knuth specification from 1988.

      Btw, equations and polynomials while conceptually are old, our contemporary notation is much younger, dating to the 16th century, and many aspects of mathematical notation are younger still.

    • This philosophy may have its place in some communities, but Python is definitely not one of them.

      Even C/C++ introduces breaking changes from time to time (after decades of deprecation though).

      There’s no practical reason why Python should commit to a 100+ year code stability, as all that comes at a price.

      Having said that, Python 2 -> 3 is a textbook example of how not to do these things.

      1 reply →

    • While i think Latex is fantastic, i think there is plenty of low hanging fruit to improve upon it... the ergonomics of the language and its macros aren't great. If nothing else there should be a better investment in tooling and ecosystem.

    • Mathematical notion has changed over the years. Is Diophantus' original system of polynomials that legible to modern mathematicians? (Even if you ignore the literally being written in ancient greek part.)

    • I agree somewhat with your sentiment and have some nostalgia for a time when software could be finished, but the comment you're replying to was making a joke that I think you may have missed.

    • > There's no reason we can't be writing code that lasts 100 years. Code is just math.

      The weather forecast is also “just math”, yet yesterday’s won’t be terribly useful next April.

      2 replies →

    • My C++ from 2005 still compiles! (I used boost 1.32)

      Most of my python from that era also works (python 3.1)

      The problem is not really the language syntax, but how libraries change a lot.

    • Kinda related question, but is code really just a math? Is it possible to express things like user input, timings, inteerupts, error handling, etc. as math?

      2 replies →

    • > This is such a breath of fresh air in a world where everything is considered obsolete after like 3 years.

      I dunno man, there's an equal amount of bullshit that still exists only because that's how it was before we were born.

      > Code is just math.

      What?? No. If it was there'd never be any bugs.

      2 replies →

    • Except uh, nobody uses infinitesimals for derivatives anymore, they all use limits now. There's still some cruft left over from the infinitesimal era, like this dx and dy business, but that's just a backwards compatibility layer.

      Anyhoo, remarks like this are why the real ones use Typst now. TeX and family are stagnant, difficult to use, difficult to integrate into modern workflows, and not written in Rust.

      7 replies →

More than 300 comments here and still no convincing answer. Why the community wastes time on trying to make CPython faster when there is pypy which is already much faster? I understand pypy lacks libraries and feature parity with up to date CPython. But… can’t everyone refocus the efforts and just move to pypy to add all the missing bits and then just continue with pypy as the “official python”? Are there any serious technical reasons not to do it?

  • > Are there any serious technical reasons not to do it?

    Yes.

    First is startup time. REPL cycle being fast is a big advantage for development. From a business perspective, dev time is more expensive then compute time by orders of magnitude. Every time you make a change, you have to recompile the program. Meanwhile with regular python, you can literally develop during execution.

    Second is compatibility. Numpy and pytorch are ever evolving, and those are written a C extensions.

    Third is LLMs. If you really want speed, Gemma27bqat that runs on a single 3090 can translate python codebase into C/C++ pretty easily. No need to have any additional execution layer. My friend at Amazon pretty much writes Java code this way - prototypes a bunch of stuff in Python, and then has an LLM write the java code thats compatible with existing intra-amazon java templates.

    • I really hope I'll never need to touch code written by people who code in python and throws it at a plausible randomiser to get java or C

      If you for some reason do this, please keep the python around so I can at least look at whatever the human was aiming at. It's probably also wrong as they picked this workflow, but there's a chance it has something useful

      7 replies →

    • Repl I get it. Possibly valid point. Yet I guess same issue are valid to node.js which seems much faster in many cases and still has valid dev experience.

      C compatibility / extension compatibility - nope. First, it is an issue of limited resources. Add more devs to pypy team and compatibility bugs gets fixed. Second, aren’t people writing C extensions due to python being slow? Make python fast - as pypy - and for some cases native code won’t be that crucial.

      So I don’t see a real issue with pypy that could not be solved by simply moving all the dev efforts from CPython.

      So are there political, personal or business issues?

      1 reply →

  • > can’t everyone refocus the efforts

    You have answered your own question.

    Seriously, though. PyPy is 2-3 versions behind CPython (3.11 vs 3.14) and it's not even 100% compatible with 3.11. Libraries such as psycopg and lxml are not fully supported. It's a hard sell.

    • Pypy only has a handful of devs. If it had the PSF's official blessing, it wouldn't lag behind CPython so much.

    • But this is exactly my point. The resources pypy has are much smaller. And still for years they managed to follow up being just 2-3 versions behind with features and high on performance.

      So why not move all the resources from CPython to close the gap with features faster and replace CPython entirely?

      Since this is not happening I expect there to be serious reasons, but I fail to see them. This is what I ask for.

  • > Are there any serious technical reasons not to do it?

    Forget technical reasons, how would you ever do it? It feels like the equivalent of cultural reprogramming "You must stop using your preferred interpreter and halt all your efforts contrary to the one true interpreter". Nah, not going to happen in a free and open source language. Who would have the authority and control to make such a directive?

    Yes, there may be technical reasons, but the reason it doesn't happen more than any other is that programming languages are languages spoken by people, and therefore they evolve organically at no one's direction. Even in languages like Python with a strong bent for cultural sameness and a BDFL type direction, they still couldn't control it. Often times, dialects happen for technical reasons, but it's hard to get rid of them on technical grounds.

  • > pypy which is already much faster

    It isn't.

    • For all my applications, going to PyPy was an instant 2x improvement.

      Not only that, it is a lot easier to hack on. I might be biased, but the whole implementstion idea of PyPy seems a lot more sane.

    • I think for pure python performance it is significantly faster at least on all the benchmarks I have seen. That said a lot of what people actually do in python calls into libraries that are written in C++ or C, which I believe has a similar performance (when it works) on pypy.

      1 reply →

I don't know how realistic only using a benchmark that only uses tight loops and integer operations. Something with hashmaps and strings more realistically represents everyday cpu code in python; most python users offload numeric code to external calls.

  • There is no "realistic" benchmark, all benchmarks are designed to measure in a specific way. I explain what my goals were in the article, in case you are curious and want to read it.

  • I agree with you, this is not an in depth look, could have been much more rigorous.

    But then I think in some ways it's a much more accurate depiction of my use case. I mainly write monte-carlo simulations or simple scientific calculations for a diverse set of problems every day. And I'm not going to write a fast algorithm or use an unfamiliar library for a one-off simulation, even if the sim is going to take 10 minutes to run (yes I use scipy and numpy, but often those aren't the bottlenecks). This is for the sake of simplicity as I might iterate over the assumptions a few times, and optimized algorithms or library impls are not as trivial to work on or modify on the go. My code often looks super ugly, and is as laughably unoptimized as the bubble sort or fib(40) examples (tail calls and nested for loops). And then if I really need the speed I will take my time to write some clean cpp with zmq or pybind or numba.

  • It's still interesting though. If the most basic thing isn't notably faster, it makes it pretty likely the more complex things aren't either.

    If your actual load is 1% python and 99% offloaded, the effect of a faster python might not mater a lot to you, but to measure python you kinda have to look at python

  • Or have it run some super common use case like a FastAPI endpoint or a numpy calculation. Yes, they are not all python, but it's what most people use Python for.

    • FastAPI is a web framework, which by definition is (or should be!) an I/O bound process. My benchmark evaluates CPU, so it's a different thing. There are a ton of web framework benchmarks out there if you are interested in FastAPI and other frameworks.

      And numpy is a) written in C, not Python, and b) is not part of Python, so it hasn't changed when 3.14 was released. The goal was to evaluate the Python 3.14 interpreter. Not to say that it wouldn't be interesting to evaluate the performance of other things as well, but that is not what I set out to do here.

      2 replies →

For me the "criminal" thing is that Pypy exists on a shoestring and yet delivers the performance and multithreading that others gradually try to add to cpython.

It's problem is, IMO, compatibility. Long ago I wanted to run it on yocto but something or other didn't work. I think this problem is gradually disappearing but it could be solved far more rapidly with a bit of money and effort probably.

The most interesting part for me is that PyPy is faster than free threaded CPython even on multi threaded code.

Really pleasing to see how smooth the non-GIL transition was. If you think about 2->3 python this was positively glorious.

And that it gets into spitting range of standard so fast is really promising too. That hopefully means the part not compatible with it get flushed out soon-ish

  • AFAIU GIL is still the default, and no-GIL is a build option, you can't select it at runtime.

    The big issue is what about all those C extension modules, some of them might require a lot of changes to work properly in a no-GIL world.

Do any of these tests measure the new experimental tail call interpreter (https://docs.python.org/3.14/using/configure.html#cmdoption-...)?

I couldn't find any note of it, so I would assume not.

It would be interesting to see how the tail call interpreter compares to the other variants.

  • The build of Python that I used has tail calls enabled (option --with-tail-call-interp). So that was in place for the results I published. I'm not sure if this optimization applies to recursive tail calls, but if it does, my Fibonacci test should have taken advantage of the optimization.

    • The tail calls in question are C tail calls inside the inner interpreter loop. They have nothing to do with Python function calls.

      2 replies →

    • It wouldn’t have, since

          fib(n-1) + fib(n-2)
      

      isn’t a tail call—there’s work left after the recursive calls, so the tail call interpreter can’t optimize it.

Python installation size over time:

    170M python-3.6.15
    183M python-3.7.17
    197M python-3.8.20
    206M python-3.9.24
    218M python-3.10.19
    331M python-3.11.14
    362M python-3.12.12
    377M python-3.13.8
    406M python-3.14.0

  • Where are you getting these numbers?

    Python 3.11 on Debian is around 21 MB installed size (python3.11-minimal + libpython3.11-minimal + libpython3.11-stdlib), not counting common shared dependencies like libc, ncurses, liblzma, libsqlite3, etc.

    Looking at the embeddable distribution for Windows (32-bit), Python 3.11 is 17.5 MB unpacked, 3.13 is slightly smaller at 17.2 MB and 3.14 is 18.4 MB (and adds the _zstd and _remote_debugging modules).

    • This is the "standard" configure + make + make install, which includes libpython.a, header files, Python's own tests (python -m test), plus __pycache__, and debug symbols. Distros of course may split it up into multiple packages, split out debug symbols, etc.

      See `docker run -it --rm -w /store ghcr.io/spack/all-pythons:2025-10-10`.

      To be fair, the main contributors are tests and the static library.

      Just looking at libpython.so

           10M libpython3.6m.so.1.0
           11M libpython3.7m.so.1.0
           13M libpython3.8.so.1.0
           14M libpython3.9.so.1.0
           17M libpython3.10.so.1.0
           24M libpython3.11.so.1.0
           30M libpython3.12.so.1.0
           30M libpython3.13.so.1.0
           34M libpython3.14.so.1.0
      

      The static library is likely large because of `--with-optimizations` enabling LTO (so smaller shared libs, but larger static libs).

That >2x performance increase over 3.9 in the first test is pretty impressive. A narrow use case for sure, but assuming you can leave your code completely alone and just have it run on a different interpreter via a few CLI commands, that's a nice bump.

What are the reasons why nobody uses pypy?

  • A lot of Python use cases don't care about CPU performance at all.

    In most cases where you do care about CPU performance, you're using numpy or scikit learn or pandas or pytorch or tensorflow or nltk or some other Python library that's more or just a wrapper around fast C, C++ or Fortran code. The performance of the interpreter almost doesn't matter for these use cases.

    Also, those native libraries are a hassle to get to work with PyPy in my experience. So if any part of your program uses those libraries, it's way easier to just use CPython.

    There are cases where the Python interpreter's bad performance does matter and where PyPy is a practical choice, and PyPy is absolutely excellent in those cases. They just sadly aren't common and convenient enough for PyPy to be that popular. (Though it's still not exactly unpopular.)

  • It doesn't play nice with a lot of popular Python libraries. In particular, many popular Python libraries (NumPy, Pandas, TensorFlow, etc.) rely on CPython’s C API which can cause issues.

    • FWIW, PyPy supports NumPy and Pandas since at least v5.9.

      That said, of all the reasons stated here, it's why I don't primarily use PyPy (lots of libraries still missing)

      1 reply →

  • Speaking only for myself, and in all sincerity: every year, there is some feature of the latest CPython version that makes a bigger difference to my work than faster execution would. This year I am looking forward to template strings, zstd, and deferred evaluation of annotations.

  • Keep in mind that the two scripts that I used in my benchmark are written in pure Python, without any dependencies. This is the sweet spot for pypy. Once you start including dependencies that have native code their JIT is less efficient. Nevertheless, the performance for pure Python code is out of this world, so I definitely intend to play more with it!

  • Because in the real world, for code where performance is needed, you run the profiler and either find that the time is spent on I/O, or that the time is spent inside native code.

    • This might have been your experience, but mine has been very different. In my experience a typical python workload is 50% importing python libraries, 45% slow python wrapper logic and 5% fast native code. I spend a lot of time rewriting the python logic in C++, which makes it 100x faster, so the resulting performance approaches "10% fast native logic, 90% useless python imports".

      7 replies →

    • I do a bit of performance work and find most often that things are mixed: there’s enough CPU between syscalls that the hardware isn’t being full maximized, but there’s enough I/O the CPUs aren’t pegged either. It is rare that the profiler finds an obvious hotspot that yields an easy win; usually it shows that with heavy refactoring you can make 10% of your load several times faster, and then you’ll need to do the same for the next 10% and so on. That is the more typical real world for me, and in that world Python is really awful when compared to rewrite-it-in-Rust.

      2 replies →

    • When it's a drop-in replacement, as in most of my code (and it's dead simple to try if it runs when you use pypy ./main.py), I wouldn't know why you should run the code 5-50% slower for no reason though

    • IRL you will have CPU-bottlenecked pure Python code too. But it's not enough to take on the unknown risk of switching to a lesser supported interpreter. Worst case you just put in the effort to convert the hot parts to multiprocessing.

  • I use it where I can, unfortunately those places are usually scripts that don’t benefit from the compiler.

    The project is moving into maintenance mode, if some folks want to get python-famous, go support pypy.

  • We look periodically and pypy is usually unusable for us due to third-party library support. E.g. psycopg2, at least as of a couple years ago. Have not checked in a while.

    • pypy has a c-extension compatibility layer that allows running psycopg2 (via psycopg2cffi) and similar for numpy etc.

  • Because it hasn't been blessed by the PSF. Plus it's always behind, so if you want to use the newest version of framework x, or package y then you're SOL.

    Python libraries used to brag about being pure Python and backwards compatible, but during the push to get everyone on 3.x that went away, and I think it is a shame.

  • I think generally people who care about performance don't tend to write their code in Python to begin with, so the culture of python is much less performance sensitive than is typical even among other interpreted languages like perl, php, ruby or javascript. The people who do need performance, but are still using python, tend to rely on native libraries doing significant numerical calculations, and many of these libraries are not compatible with PyPy. The escape hatch there is to offload more and more of the computation into the native runtime rather than to optimize the python performance.

  • It currently only supports Python 3.11. That is a big reason.

    • I was happy to see it supports a fairly recent Python3 at all now, like Py3.5 or what is it that ships with most of the expected stuff? Works for me, I'd target something like that for compatibility anyway

  • I keep wondering the same. It's a significant speed-up in most cases and equally easy to (apt) install

    For public projects I default the shebang to use `env python3` but with a comment on the next line that people can use if they have pypy. People seem to rarely have it installed but they always have Python3 (often already shipped with the OS, but otherwise manually installed). I don't get it. Just a popularity / brand awareness thing I guess?

  • Because all the heavy number-crunching code is already written in C or Rust or as CUDA kernels, so the actual time spent running Python code is miniscule. If it starts to matter, I would probably reach for Cython first. PyPy is an extremely impressive project, but using it adds a lot of complexity to what is usually a glue language. It is a bit like writing a JIT for Bash.

  • I've never experienced any problems that could be attributed to the speed of my Python runtime. I use Python a lot for internal scripting and devops work, but never in a production environment that scaled beyond a few hundred users. I suspect most Python usecases are like that, and CPython is just the safest option.

  • It's not easily available in uv. Even if I installed it outside uv, it always seems significantly out of date. I'm running code in spaces where with uv I can control all the installs of Python, so I don't benefit from using an older release for compatibility.

  • The advantage of core python is that you import stuff and 99.999999% of the time it works.

    With PyPy not so much.

  • Personally: cpyext always lags changes in the CPython ABI and headers which my code relies on, or I'm relying on internals which cpyext doesn't implement at all

  • We want the new features more than we want performance!

    Also: there are some libraries that just don't work on pypy.

  • because it turns out that optimizing performance of a programming language designed for use-cases where runtime performance doesn't matter ... doesn't matter

    • There's currently talk of adding gigawatts of data center capacity to the grid just for use cases where python dominates development. While a lot of that will be compiled into optimized kernels on CPU or GPU, it only takes a little bit of 1000x slower code to add up to a significant chunk of processing time at training or inference time.

      2 replies →

    • Might as well take the work that's already done though? I can't think of a logical reason why you'd want to run it at potentially half the speed (depending on the hot code specifics how much, if any, speedup you get of course)

Pithon, haha

I'm thankful they included a compiled language for comparison, because most of the time when I see Python benchmarks, they measure against other versions of Python. But "fast python" is an oxymoron and 3.14 doesn't seem to really change that, which I feel most people expected given the language hasn't fundamentally changed.

This isn't a bad thing; I don't think Python has to be or should be the fastest language in the world. But it's interesting to me seeing Python getting adopted for a purpose it wasn't suited for (high performance AI computing). Given how slow it is, people seem to think there's a lot of room for performance improvements. Take this line for instance:

> The free-threading interpreter disables the global interpreter lock (GIL), a change that promises to unlock great speed gains in multi-threaded applications.

No, not really. I mean, yeah you might get some speed gains, but the chart shows us if you want "great" speed gains you have two options: 1) JIT compile which gets you an order of magnitude faster or 2) switch to a static compiled language which gets you two orders of magnitude faster.

But there doesn't seem to be a world where they can tinker with the GIL or optimize python such that you'll approach JIT or compiled perf. If perf is a top priority, Python is not the language for you. And this is important because if they change Python to be a language that's faster to execute, they'll probably have to shift it away from what people like about it -- that it's a dynamic, interpreted language good for prototyping and gluing systems together.

  • I've been writing Python professionally for a couple of decades, and there've only been 2-3 times where its performance actually mattered. When writing a Flask API, the timing usually looks like: process the request for .1ms, make a DB call for 300ms, generate a response for .1ms. Or writing some data science stuff, it might be like: load data from disk or network for 6 seconds, run Numpy on it for 3 hours, write it back out for 3 seconds.

    You could rewrite that in Rust and it wouldn't be any faster. In fact, a huge chunk of the common CPU-expensive stuff is already a thin wrapper around C or Rust, etc. Yeah, it'd be really cool if Python itself were faster. I'd enjoy that! It'd be nice to unlock even more things that were practical to run directly in Python code instead of swapping in a native code backend to do the heavy lifting! And yet, in practice, its speed has almost never been an issue for me or my employers.

    BTW, I usually do the Advent of Code in Python. Sometimes I've rewritten my solution in Rust or whatever just for comparison's sake. In almost all cases, choice of algorithm is vastly more important than choice of language, where you might have:

    * Naive Python algorithm: 43 quadrillion years

    * Optimal Python algorithm: 8 seconds

    * Rust equivalent: 2 seconds

    Faster's better, but the code pattern is a lot more important than the specific implementation.

    • > Or writing some data science stuff, it might be like: load data from disk or network for 6 seconds, run Numpy on it for 3 hours, write it back out for 3 seconds.

      > You could rewrite that in Rust and it wouldn't be any faster.

      I was asked to rewrite some NumPy image processing in C++, because NumPy worked fine for 1024px test images but balked when given 40 Mpx photos.

      I cut the runtime by an order of magnitude for those large images, even before I added a bit of SIMD (just to handle one RGBX-float pixel at a time, nothing even remotely fancy).

      The “NumPy has uber fast kernels that you can't beat” mentality leads people to use algorithms that do N passes over N intermediate buffers, that can all easily be replaced by a single C/C++/Rust (even Go!) loop over pixels.

      Also reinforced by “you can never loop over pixels in Python - that's horribly slow!”

      2 replies →

    • That's because you're doing web stuff. (I/O limited). So much of our computing experience has been degraded due to this mindset applied more broadly. Despite a steady improvement in hardware, my computing experiences have been stagnating and degraded in terms of latency, responsiveness etc.

      I'm not going to even go into the comp chem simulations I've been running, or that about 1/3 the stuff I do is embedded.

      I do still use python for web dev, partly because as you say, it's not CPU-bound, and partly because Python's Django framework is amazing. But I have switched to rust for everything else.

      17 replies →

    • Advent of code is deliberately set up to be doable in Python. You can also imagine a useful problem which Rust takes 2 weeks to do, how long would it take in Python?

    • And my experience is this: you start using ORMs, and maybe you need to format a large table once in a while. Then your Python just dies. Bonus points if you're using async to service multiple clients with the same interpreter.

      And you're now forced to spend time hunting down places for micro-optimizations. Or worse, you end up with a weird mix of Cython and Python that can only be compiled on the developer's machine.

      1 reply →

    • LOL, python is plenty fast if you make sure it calls C or Rust behind the scenes. Typical of 'professional' python people. Something too slow? just drop into C. It surely sounds weird to everyone who complains about Python being slow and the response is on these lines.

      8 replies →

    • Exactly, most Python devs neither need nor care about perf. Most applications don't even need perf, because whether it's .1 second or .001 seconds, the user is not going to notice.

      But this current quest to make Python faster is precisely because the sluggishness is noticeable for the task it's being used for most at the moment. That 6 second difference you note between the Optimal Python and the optimal Rust is money on the table if it translates to higher hardware requirements or more server time. When everything is optimal and you could still be 4x faster, that's a tough pill to swallow if it means spending more $$$.

      11 replies →

  • It's pretty simple. Nobody wants to do ML R&D in C++.

    Tensorflow is a C++ library with python bindings. Pytorch has supported a C++ interface for some time now, yet virtually nobody uses C++ for ML R&D.

    The relationship between Python and C/C++ is the inverse of the usual backend/wrapper cases. C++ is the replaceable part of the equation. It's a means to an end. It's just there because python isn't fast enough. Nobody would really care if some other high perf language took its place.

    Speed is important, but C++ is even less suited for ML R&D.

    • I think readability is what made python a winner. I can quickly type down my idea like a pseudo code, I can easily skim through other people’s algos. In C++ even a simple algo with a 100 lines of pseudo code will balloon to thousands of lines in c++

  • I agree. Unless they make it like 10x faster it doesn't really change anything. It's still a language you only use if you absolutely don't care whatsoever about performance and can guarantee that you never will.

    • Well, that's not true at all. Scientists care about performance, but it turns out that Python is really good for number crunching since it is really good for using very fast C libraries. I know people who use pandas to manipulate huge datasets from radar astronomy. Also, of course, it's used in machine learning. If Python was "only" used in situations where you don't care about performance, it would not be used in so many scenarios that definitely need high performance. Sure, it is not pure Python, but it's still Python being used, just used to orchestrate C libraries

    • If you’re actually building and shipping software as a business Python is great. The advantages of Python for a startup are many. Large pool of talent that can pickup the codebase on essentially day 1. Fairly easy to reason about, mature, code velocity, typically one and only one way to do things as opposed to JavaScript. There is way more to the story than raw performance.

      2 replies →

    • The counterargument used to be, the heavy lifting will be offloaded to python modules written in C, like numpy.

      Which was true, but maybe not the strongest argument. Why not use a faster language in the first place?

      But it's different now. There's huge classes of problems where pytorch, jax &co. are the only options that don't suck.

      Good luck competing with python code that uses them on performance.

      9 replies →

    • >>> you absolutely don't care whatsoever about performance and can guarantee that you never will.

      Those are actually pretty good bets, better than most other technological and business assumptions made during projects. After all, a high percentage of projects, perhaps 95%, are either short term or fail outright.

      And in my own case, anything I write that is in the 5% is certain to be rewritten from scratch by the coding team, in their preferred language.

      1 reply →

    • Probably people at some point were making same arguments about ASM and C. How many people though do ASM these days? Not arguing that for now it is relevant point, obviously Rust / C are way faster.

      4 replies →

seems loved languages such as python & ruby (ZJIT | TruffuleRuby) have been getting a lot performance improvements lately. of course JS with v8 kickstarted this - followed by PHP.

so for majority of us folks use what you love - the performance will come.

  • As someone who was a hardcore python fanboy for a long time, no, no it won't. There are classes of things that you can only reasonably do in a language like rust, or where go/kotlin will save you a crazy amount of pain. Python is fine for orchestration and prototyping, but if it's the only arrow you have in your quiver you're in trouble.

    • Completely agree, Python is great for its simple syntax, C-interop and great library ecosystem, but it is a pain to debug, deploy, and maintain in more complex use cases, and doesn't play as nicely as other languages with modern stacks (eg. k8s). What is pleasure for the developer (no explicit typing, wild i/o-as-you-go, a library for everything) is pain for the maintainer (useless error messages, sudden exceptions of lacking UAC, dependency hell).

      Go, Kotlin and Rust are just significantly more modern and better designed, incorporating the lessons from 90s languages like Python, Ruby and Java.

    • I know sometimes performance doesn’t matter, and python is certainly useful, but it’s not fast. It can be fast enough and they’ve put a lot of effort into making fast libraries (called in c).

      When doing bioinformatics we had someone update/rewrite a tool in java and it was so much faster. Went from a couple days to some like 4 hours of runtime.

      Python certainly can be used in production (my experience maintaining some web applications in Java would make me reach for python/php/ruby to create a web backend speed be dammed). Python has some great libraries.

    • I even changed to JS as my fave for backends. Still using Py for other stuff ofc, but I'm constantly missing some of the JS niceties.

    • At least Python doesn't have an extremist "100% Pure" ideology like Java, and instead (like TCL and Lua) it's been designed from the ground up for easily integrating with other languages and libraries, embedding, and extending, instead of Java's intolerantly weaponized purity and linguistic supremacy.

      Reasons why Sun and Java failed:

      Strategy over product. McNealy cast Java as a weapon of mass destruction to fight Microsoft, urging developers to "evangelize Java to fight Microsoft." That fight-first framing made anti-Microsoft positioning the goal line, not developer throughput.

      Purity over pragmatism. Sun’s "100% Pure Java" program explicitly banned native methods and dependencies outside the core APIs. In practice, that discouraged bridges to real-world stacks and punished teams that needed COM/OS integration to ship. (Rule 1: "Use no native methods.")

      "100% Pure Java" has got to be one of the worst marketing slogans in the history of programming languages, signaling absolutism, exclusion, and gatekeeping. And it was technically just as terrible and destructive an idea that held Java back from its potential as an inclusive integration, extension, and scripting language (especially in the web browser context, since it was so difficult to integrate, that JavaScript happened instead and in spite of Java).

      Lua, Python, and even TCL were so much better and successful at embedding and extending applications than Java ever was (or still is), largely because they EMBRACED integration and REJECTED "purity".

      Java's extremist ideological quest for 100% purity made it less inclusive and resilient than "mongrel" languages and frameworks like Lua, Python, TCL, SWIG, and Microsoft COM (which Mozilla even cloned as "XP/COM"), that all purposefully enabled easy miscegenation with existing platforms and libraries and APIs instead of insanely insisting everyone in the world rewrite all their code in "100% Pure Java".

      That horrible historically troubling slogan was not just a terrible idea technically and pragmatically, but it it also evoked U.S. nativist/KKK's "100% Americanism", Nazi's "rassische Reinheit", "Reinhaltung des Blutes", and "Rassenhygiene", Fascist Italy's "La Difesa della Razza", and white supremacist's "white purity". It's no wonder Scott McNealy is such a huge Trump supporter!

      While Microsoft courted integrators. Redmond pushed J/Direct / Java-COM paths, signaling "use Windows features from Java if that helps you deliver." That practicality siphoned off devs who valued getting stuff done over ideological portability.

      Community as militia. The rhetoric ("fight," "evangelize") enlisted developers as a political army to defend portability, instead of equipping them with first-rate tooling and sanctioned interop. The result: cultural gatekeeping around "purity" rather than unblocking use cases.

      Ecosystem costs. Tooling leadership slid to IBM’s aptly named Eclipse (a ~$40M code drop that became the default IDE), while Sun’s own tools never matched Eclipse’s pull: classic opportunity cost of campaigning instead of productizing.

      IBM's Eclipse cast a dark shadow over Sun's "shining" IDE efforts, which could not even hold a candle to Microsoft's Visual Studio IDE that Sun reflexively criticized so much without actually bothering to use and understand the enemy.

      At least Microsoft and IBM had the humility to use and learn from their competitor's tools, in the pursuit of improving their own. Sun just proudly banned them from the building, cock-sure there was nothing to learn from them. And now we are all using polyglot VSCode and Cursor, thanks to Microsoft, instead of anything "100% Pure" from Sun!

      Litigation drain. Years of legal trench warfare (1997 suit and 2001 settlement; then the 2004 $1.6B peace deal) defended "100% Pure Java" but soaked time, money, and mindshare that could have gone to developer-facing capabilities.

      Optics that aged poorly. The very language of "purity" in "100% Pure Java" read as ideological and exclusionary to many -- whatever Sun's presumed intent -- especially when it meant "rewrite in Java, don’t integrate." The cookbook literally codified "no native methods," "no external libraries," and even flagged Runtime.exec as generally impure.

      McNealy’s self-aggrandizing war posture did promote Java’s cross-platform ideal, but it de-prioritized developer pragmatism -- stigmatizing interop, slow-rolling mixed-language workflows, and ceding tools leadership -- while burning years on lawsuits. If your priority was "ship value fast," Sun’s purity line often put you on the wrong side of the border wall.

      And now finally, all of Java's remaining technical, ideological, and entrenched legacy enterprise advantages don't matter any more, alas, because they are all overshadowed by the unanthropomorphizable lawnmower that now owns it and drives it towards the singular goal of extracting as much profit from it as possible.

      2 replies →

Very interesting post, thanks for putting it together.

Rust is indeed quite fast, I thought NodeJS was much better tbh., although it's not bad. I'd be interested to learn what's holding it back because I've seen many implementations where V8 can get C++-like performance (I mean it's C++ after all). Perhaps there's a lot of overhead in creating/destroying temporary objects.

  • > V8 can get C++-like performance (I mean it's C++ after all)

    I don’t think that follows. Python is written in C, but that doesn’t mean it can get C-like performance. The sticking point is in how much work the runtime has to do for each chunk of code it has to execute.

    (Edit: sorry, that’s in reply to another child content. I’m on my phone in a commute and tapped the wrong reply button.)

  • One reason is that I did not spend much time optimizing the Node and Rust versions, I just translated the Python logic as directly and quickly as I could. At least I did not ask an LLM to do it for me, which I hope counts. ;-)

    Edit: fixed a couple of typos.

  • V8 gets C++-like performance when it is getting code that JITs very well. This is typically highly-numeric code, even sometimes tuned specifically for the JIT. This tends to cause a great deal of confusion when people see the benchmarks for that highly numeric code and then don't understand why their more conventional code doesn't get those speeds. It's because those speeds only apply to code you're probably not writing.

    If you are writing that sort of code, then it does apply; the speed for that code is real. It's just that the performance is much more specific than people think it is. In general V8 tends to come in around the 10x-slower-than-C for general code, which means that in general it's a very fast scripting language, but in the landscape of programming languages as a whole that's middling single-thread performance and a generally bad multiprocessing story.

  • For the bubble sort implementation, it's due to the use of the destructuring assignment in the benchmark code. When swapping to a regular swap using a temporary variable, the benchmark runs more than 4 times faster on my machine. Still not at Rust level of performance, but a bit closer to it.

Is fold comment a option based on karma or something? I loved the most voted post here on how Miguel helped the guy but, it is unrelated and for the first time I guess I realized there is no fold so I can go to people actually talking about the article...

  • You got more karma than me, so you're probably just looking past it. It's the [-] button on the right end of the comment header, just to the right of the "next" button.

I feel like Python should be much faster already. With all the big companies using Python and it's huge popularity I would have expected that a lot of money, work and research would be put into making Python faster and better.

  • The Faster CPython project was from one of "the big companies" and did make significant progress with each version of Python, even if some of its more ambitious goals weren't met. The results of which you're seeing in the benchmarks in this blog post.

  • Why?

    There are other languages you can use to make stuff go fast. Python isn't for making stuff go fast. Its for rapid dev, and that advantage matters way more when you already are going to be slow due to waiting for network response

    • This has always confused me... is Python really that much better at rapid dev? I work on a Python project and every day I wish the people that started the project had chosen a different language that actually scaled well with the problem rather than Python, which they likely chose because it was for "rapid dev".

      10 replies →

I’m very glad python is getting faster. But the correct answer to “Is Python Really That Slow?” is unambiguously YES. Unless you’re using some ML library like torch or numpy which spends all its time in optimized C code, python is still EXTREMELY slow. We are going to need a lot of these 10% improvements for python to be comparable to Go, Java, or Node, each of which are about 30x faster on typical computer tasks.

I started using Python again recently after a 15 year break. The reason was I started working with LangChain, specifically LangGraph agents. The JavaScript/TypeScript versions are months behind. In the AI world with the progress thats been made recently, months might as well be years.

Kinda curious. Have you figured out why the code runs faster on a Mac?

  • It's two different computers with different CPUs, so different runtimes are expected and has nothing to do with the OS.

    > Framework laptop running Ubuntu Linux 24.04 (Intel Core i5 CPU)

    > Mac laptop running macOS Sequoia (M2 CPU)

I like the Rust being there. It is a reality check to keep the numbers in perspective. (C, C++ or something like that would work too)

Very nice post - it's good to see benchmarks done for humans.

For fun, I tried this in Raku:

  (0, 1, *+* ... *)[40]    #0.10s user 0.03s system 63% cpu 0.214 total

lol

Seriously, Python is doing great stuff to squeeze out performance from a scripting language. Realistically, Raku has fewer native libraries (although there is Inline::Python) and the compiler still has a lot of work to get the same degree of optimisation (although one day it could compare).

EDIT: for those who have commented, yes you are correct … this is a “cheat” and does not seek to state that Raku is faster than Python - as I said Raku still has a lot of work to do to catch up.

  • I take it this is supposed to be the equivalent of fib(40), which ran on the author's system in Pyπ in 6.59 seconds and apparently on yours, with Raku, in 0.21?

    Do you have the same hardware as the author or should one of you run the other's variant to make this directly comparable?

    • No, this is very much not the same. The Raku version is like writing this in Python:

          def fibonacci():
              a, b = 0, 1
      
              while True:
                  yield a
                  a, b = b, a+b
      

      And taking the 40th element. It's not comparable at all to the benchmark, that's deliberately an extremely slow method of calculating fibonacci numbers for the purpose of the benchmark. For this version, it's so fast that the time is dominated by the time needed to start up and tear down the interpreter.

  • Well, sure; you're using dynamic programming, while the stress test Python Fibonacci code is deliberately using recursion without memoization — it makes function calls proportionate to the number computed. Most of the time you're seeing in the Raku code is the interpreter startup. Python doesn't have syntax strongly oriented towards that sort of trick (it's not as strong of a second-best APL as it is a second-best Lisp or Haskell), but:

      $ python -m timeit "x = (1, 0); [x[0] for _ in range(40) if (x := (x[0] + x[1], x[0]))][-1]"
      50000 loops, best of 5: 4 usec per loop
    

    (Or a "lazy iterator" approach:)

      $ python -m timeit --setup 'from itertools import islice, count' 'x = (1, 0); next(islice((x[0] for _ in count() if (x := (x[0] + x[1], x[0]))), 40, None))'
      50000 loops, best of 5: 5.26 usec per loop

It remains solidly among the slowest languages you could choose. Consider your use case and the trade-offs wisely.

Why isn’t Pypy the default recommended runtime?

The performance increase seems jaw dropping.

> And this is a bit disappointing. At least for this test, the JIT interpreter did not produce any significant performance gains, so much that I had to double and triple check that I used a correctly built interpreter with this feature enabled. I do not know much about the internals of the new JIT compiler, but I'm wondering if it cannot deal with this heavily recursive function. FWIW one thing that is worth calling out here is that the initial goal for JIT right now in Python is getting it relatively stable, functional, and more or less getting the initial implementation out there. It's not surprising at all that it's not faster.

I say this because I think the teams working on free-threaded and JIT python maybe could have done a better job publicly setting expectations.

  • I mean, Guido had a 2021 Faster CPython presentation where they claimed "5x in 4 years (1.5x per year)"[0]. Developers have significantly walked back those expectations since then.

    [0] Github slide deck https://github.com/faster-cpython/ideas/blob/main/FasterCPyt...

    • One important caveat to remember is that this is before a lot of the work on free-threaded python started in full force. A lot of cutting edge work had to be done to support this in the GC but this came with performance penalties. As a result, the trajectory of the Faster CPython effort changed quite a bit.

      Didn't help Microsoft axed several folks on that team too...

      2 replies →

Only tested against NodeJS and Rust

What about Lua and LuaJIT

Here's hoping they make 16 patch versions

  • I hope they speedrun to Python 6.28 because tau > pi

    (mini unrelated rant. I think pi should equal 6.28 and tau should equal 3.14, because pi looks like two taus)

    • > I think pi should equal 6.28 and tau should equal 3.14, because pi looks like two taus

      Ha. Undeniable proof that we had them backwards all along!

object oriented developer language, whereas the API is the mass-production of backend apps

Yeah honestly I don't really care about these benchmarks. Python isn't built for raw performance and that's totally fine! It's the number one choice for prototyping and can do so much, that's what actually matters. I think it's cool they're working on speed improvements though, means my prototype-to-production cycle gets a bit smoother lol.

I know this is not a highquality comment, but this must be the ideal language to run on a raspberry Pi. I'll see myself out; I also do Bar Mitzwahs.

honestly if the performance of the python interpreter has a big impact on your application's performance and that's something you care about - you're already doing things very wrong

tl;dr: Two orders of magnitude slower than Rust, so 2-3 orders slower than native. Python on a 2 GHz processor runs as fast as C on a 2-20 MHz processor.