Python really needs to take the Typescript approach of "all valid Python4 is valid Python3". And then add value types so we can have int64 etc. And allow object refs to be frozen after instantiation to avoid the indirection tax.
Sensible type-annotated python code could be so much faster if it didn't have to assume everything could change at any time. Most things don't change, and if they do they change on startup (e.g. ORM bindings).
To clarify, it is nuts that in an object method, there is a performance enhancement through caching a member value.
class SomeClass
def init(self)
self.x = 0
def SomeMethod(self)
q = self.x
## do stuff with q, because otherwise you're dereferencing self.x all the damn time
This is not just a performance concern, this describes completely different behaviour. You forgot that self.x is just Class.__getattr__(self, 'x') and that you can implement __getattr__ how you like. There is no object identity across the values returned by __getattr__.
> it is nuts that in an object method, there is a performance enhancement through caching a member value
i don't understand what you think is nuts about this. it's an interpreted language and the word `self` is not special in any way (it's just convention - you can call the first param to a method anything you want). so there's no way for the interpreter/compiler/runtime to know you're accessing a field of the class itself (let alone that that field isn't a computed property or something like that).
lots of hottakes that people have (like this one) are rooted in just a fundamental misunderstanding of the language and programming languages in general <shrugs>.
That was how the Mojo language started. And then soon after the hype they said that being a superset of Python was no longer the goal. Probably because being a superset of Python is not a guarantee for performance either.
Being a superset would mean all valid Python 3 is valid Python 4. A valuable property for sure, but not what OP suggested. In fact, it is the exact opposite.
Performance is one part of the discussion, but cleanliness is another. A Python4 that actually used typing in the interpreter, had value types, had a comptime phase to allow most metaprogramming to work (like monkey patching for tests) would be great! It would be faster, cleaner, easier to reason about, and still retain the great syntax and flexibility of the language.
I’ll be happy if over night all Python code in the world can reap 10-100x performance benefits without changing much of a codebase, you can continue having soup of multiple languages.
I have made some experiments with P2W, my experimental Python (subset) to WASM compiler. Initial figures are encouraging (5x speedup, on specific programs).
> python code could be so much faster if it didn't have to assume everything could change at any time
Definitely, but then it wouldn't be Python. One of the core principles of Python's design is to be extremely dynamic, and that anything can change at any time.
There are many other, pretty good, strictly dynamically typed languages which work just as well if not better than Python, for many purposes.
I feel that this excuse is being trotted out too much. Most engineers never get to choose the programming language used for 90% of their professional projects.
And when Python is a mainstream language on top of which large, globally known websites, AI tools, core system utilities, etc are built, we should give up the purity angle and be practical.
Even the new performance push in Python land is a reflection of this. A long time ago some optimizations were refused in order to not complicate the default Python implementation.
Fine by me. I don't particularly like Python, but it's the defacto standard in my field so I have to use it (admittedly this is an improvement over a decade ago, when MATLAB was the defacto standard). I don't care about preserving the spirit of Python, I just care that the thing that bears the name Python meets my needs.
I share your view. Python's flexibility is central to Python.
Even type annotations, though useful, can get in the way for certain tasks.Betting on things like these to speed up things would be a mistake, since it would kind of force you to follow that style.
Anything that accelearates things should rely on run-time data, not on type annotations that won't change.
Isn't rpython doing that, allowing changes on startup and then it's basically statically typed? Does it still exist? Was it ever production ready? I only once read a paper about it decades ago.
RPython is great, but it changes semantics in all sorts of ways. No sets for example. WTF? The native Set type is one of the best features of Python. Tuples also get mangled in RPython.
I think sadly a lot of Python in the wild relies heavily, somewhere, on the crazy unoptimisable stuff. For example pytest monkey patches everything everywhere all the time.
You could make this clean break and call it Python 4 but frankly I fear it won't be Python anymore.
As a person who has spent a lot of time with pytest, I'm ready for testing framework that doesn't do any of that non-obvious stuff.
Generally use unittest as much as I can these days, so much less _wierd_ about how it does things. Like jeeze pytest, do you _really_ need to stress test every obscure language feature? Your job is to call tests.
Allowing metaprogramming at module import (or another defined phase) would cover most monkey patching use cases. From __future__ import python4 would allow developers to declare their code optimisable.
> Python really needs to take the Typescript approach of "all valid Python4 is valid Python3
Great idea, but I'm not convinced that they learned anything from the Python 2 to 3 transition, so I wouldn't hold my breath.
If you want a language system without contempt for backward compatibility, you're probably better off with Java/C++/JavaScript/etc. (though using JS libraries is like building on quicksand.) Bit of a shame since I want to like Python/Rust/Swift/other modern-ish languages, but it turns out that formal language specifications were actually a pretty good idea. API stability is another.
Oh, and while we're at it, fix the "empty array is instantiated at parse time so all your functions with a default empty array argument share the same object" bullshit.
It has nothing to do with whether the list is empty. It has nothing to do with lists at all. It's the behaviour of default arguments.
It happens at the time that the function object is created, which is during runtime.
You only notice because lists are mutable. You should already prefer not to mutate parameters, and it especially doesn't make sense to mutate a parameter that has a default value because the point of mutating parameters is that the change can be seen by the caller, but a caller that uses a default value can't see the default value.
The behaviour can be used intentionally. (I would argue that it's overused intentionally; people use it to "bind" loop variables to lambdas when they should be using `functools.partial`.)
If you're getting got by this, you're fundamentally expecting Python to work in a way that Pythonistas consider not to make sense.
Execution time, not parse time. It's a side effect of function declarations being statements that are executed, not the list/dict itself. It would happen with any object.
I went sort of this route in an experiment with Claude.. I really want Python for .NET but I said, damn the expense, prioritize .NET compatibility, remove anything that isn't supported feasably. It means 0 python libs, but all of NuGet is supported. The rules are all signatures need types, and if you declare a type, it is that type, no exceptions, just like in C# (if you squint when looking at var in a funny way). I wound up with reasonable results, just a huge trade of the entire Python ecosystem for .NET with an insanely Python esque syntax.
Still churning on it, will probably publish it and do a proper blog post once I've built something interesting with the language itself.
I'm been occasionally glancing at PR/issue tracker to keep up to date with things happening with the JIT, but I've never seen where the high level discussions were happening; the issues and PRs always jumped right to the gritty details. Is there anywhere a high-level introduction/example of how trace projection vs recording work and differ? Googling for the terms often returns CPython issue tracker as the first result, and repo's jit.md is relatively barebones and rarely updated :(
Similarly, I don't entirely understand refcount elimination; I've seen the codegen difference, but since the codegen happens at build time, does this mean each opcode is possibly split into two (or more?) stencils, with and without removed increfs/decrefs? With so many opcodes and their specialized variants, how many stencils are there now?
> I've never seen where the high level discussions were happening
Thanks for your interest. This is something we could improve on. We were supposed to document the JIT better in 3.15, but right now we're crunching for the 3.15 release. I'll try to get to updating the docs soon if there's enough interest. PEP 744 does not document the new frontend.
> does this mean each opcode is possibly split into two (or more?) stencils, with and without removed increfs/decrefs?
This is a great question, the answer is not exactly! The key is to expose the refcount ops in the intermediate representation (IR) as one single op. For example, BINARY_OP becomes BINARY_OP, POP_TOP (DECREF), POP_TOP (DECREF). That way, instead of optimizing for n operations, we just need to expose refcounting of n operations and optimize only 1 op (POP_TOP). Thus, we just need to refactor the IR to expose refcounting (which was the work I divided up among the community).
If you have any more questions, I'm happy to answer them either in public or email.
I also did some reading and experiments, so quickly talking about things I've found out re: refcount elimination:
Previously given an expression `c = a + b`, the compiler generated a sequence of two LOADs (that increment the inputs' refcounts), then BINARY_OP that adds the inputs and decrements the refcounts afterwards (possibly deallocating the inputs).
But if the optimizer can prove that the inputs definitely will have existing references after the addition finishes (like when `a` and `b` are local variables, or if they are immortals like `a+5`), then the entire incref/decref pair could be ignored. So in the new version, the DECREFs part of the BINARY_OP was split into separate uops, which are then possibly transformed into POP_TOP_NOP by the optimizer.
And I'm assuming that although normally splitting an op this much would usually cost some performance (as the compiler can't optimize them as well anymore), in this case it's usually worth it as the optimization almost always succeeds, and even if it doesn't, the uops are still generated in several variants for various TOS cache (which is basically registers) states so they still often codegen into just 1-2 opcodes on x86.
One thing I don't entirely understand, but that's super specific from my experiment, not sure if it's a bug or special case: I looked at tier2 traces for `for i in lst: (-i) + (-i)`, where `i` is an object of custom int-like class with overloaded methods (to control which optimizations happen). When its __neg__ returns a number, then I see a nice sequence of
_POP_TOP_INT_r32, _r21, _r10.
But when __neg__ returns a new instance of the int-like class, then it emits
_SPILL_OR_RELOAD_r31, _POP_TOP_r10, _SPILL_OR_RELOAD_r01, _POP_TOP_r10, etc.
Is there some specific reason why the "basic" pop is not specialized for TOS cache? Is it because it's the same opcode as in tier1, and it's just not worth it as it's optimized into specialized uops most of the time; or is it that it can't be optimized the same way because of the decref possibly calling user code?
I think CPython already had tier2 and some tracing infrastructure when the copy-and-patch JIT backend was added; it's the "JIT frontend" that's more obscure to me.
UPDATE: I misunderstood the question :-/ You can ignore this.
I love playing with compilers for fun, so maybe I can shed some light. I’ll explain it in a simplified way for everyone’s benefit (going to ignore the stack):
When an object is passed between functions in Python, it doesn’t get copied. Instead, a reference to the object’s memory address is sent. This reference acts as a pointer to the object’s data. Think of it like a sticky note with the object’s memory address written on it. Now, imagine throwing away one sticky note every time a function that used a reference returns.
When an object has zero references, it can be freed from memory and reused. Ensuring the number of references, or the “reference count” is always accurate is therefore a big deal. It is often the source of memory leaks, but I wouldn’t attribute it to a speed up (only if it replaces GC, then yes).
Oh man, Python 2 > 3 was such a massive shift. Took almost half a decade if not more and yet it mainly changing superficial syntax stuff. They should have allowed ABIs to break and get these internal things done. Probably came up with a new, tighter API for integrating with other lower level languages so going forward Python internals can be changed more freely without breaking everything.
The text encoding stuff wasn't a small change considering what it could break, at least. And remember we're sometimes talking about software that would cost a lot of money to migrate or upgrade. I still maintain some 2.x python code-bases that will be very expensive to migrate and the customer is not willing to invest that money.
Although your general sentiment is something I agree with(if it's going to be painful do it and get it over with), I don't believe anybody knew or could've guessed what the reaction of the ecosystem would be.
Your last point about being able to change internals more freely is also great in theory but very difficult(if not impossible) to achieve in practice.
I don't know. Having maintained some small projects that were free and open source, I saw the hostility and entitlement that can come from that position. And those projects were a spec of dust next to something like Python. So I think the core team is doing the best they can. It was always going to be damned if you do, damned if you don't.
> I still maintain some 2.x python code-bases that will be very expensive to migrate and the customer is not willing to invest that money.
Slight tangent: if Claude can decimate IBM stock price by migrating off Cobol for cheap, surely we can do Python 2 to 3 now, too?
About the internals: we sort of missed an opportunity there, but back then there also didn't quite know what they were doing (or at least we have better ideas of what's useful today). And making the step from 2 to 3 even bigger might have been a bad idea?
I cannot believe people are still acting like Python 2->3 was a huge fuck-up and an enormous missed opportunity. When in reality Python is by most measures the most popular language and became so AFTER that switch.
Since the switch we have seen enormous companies being built from scratch. There is no reason for anyone to be complaining about it being too hard to upgrade in 2026
Living through it... Python 3 made a lot of changes for the better but 3.0 in particular included a bunch of unforced errors that made it too hard for people to upgrade in one go.
It wasn't until much later (I would say 3.4 or 3.5?) that we had good tooling to allow for migrating from Python 2 to Python 3 gradually, which is what most tools needed to do.
The final thing that made Python upgrading easy was making a bunch of changes (along with stuff like six) so that you could write code that would run identically in Python 2 and Python 3. That lets you do refactors over time, little cleanups, and not have the huge "move to Python 3" commit.
> Python is by most measures the most popular language and became so AFTER that switch
The switch had nothing to do with Python's rise in popularity though, it was because of NumPy and later PyTorch being adopted by data scientist and later machine learning tasks that themselves became very popular. Python's popularity rose alongside those.
> There is no reason for anyone to be complaining about it being too hard to upgrade in 2026
The "complaints" are about unnecessary and pointless breakage, that was very difficult for many codebases to upgrade for years. That by now most of these codebases have been either abandoned, upgraded or decided to stick with Python2 until the end of time doesn't mean these pains didn't happen nor that the language's developers inflicting them to their users were a good idea because some largely unrelated external factors made the language popular several years later.
It took a long time for python 3 to add the necessary backwards compatibility features to allow people to switch over. Once they did it was fine, but it was a massive fuck up until then. The migration took far longer than it should have done
Its widely regarded as a disaster for good reason, that forced some corrections in python to fix it. Just because its fine now, does not mean it was always fine
The Python devs didn’t want to make huge changes because they were worried Python 3 would end up taking forever like Perl 6. Instead they went to the other extreme and broke everyone’s code for trivial reasons and minimal benefit, which meant no-one wanted to upgrade.
Even the main driver for Python 3, the bytes-Unicode split, has unfortunately turned out to be sub-optimal. Python essentially bet on UTF-32 (with space-saving optimisations), while everyone else has chosen UTF-8.
I'm curious is the JIT developers could mention any Python features that prevent promising JIT features. An earlier Ken Jin blog [1], mentions how __del__ complicates reference counting optimization.
There is a story that Python is harder to optimize than, say, Typescript, with Python flexibility and the C API getting mentioned. Maybe, if the list of troublesome Python features was out there, programmers could know to avoid those features with the promise of activating the JIT when it can prove the feature is not in use. This could provide a way out of the current Python hard-to-JIT trap. It's just a gist of an idea, but certainly an interesting first step would be to hear from the JIT people which Python features they find troublesome.
It's interesting you mention __del__ because Javascript not only doesn't have destructors but for security reasons (that are above my pay grade) but the spec _explicitly prohibits_ implementations from allowing visibility into garbage collection state, meaning that code cannot have any visibility into deallocations.
I think __del__ is tricky though. In theory __del__ is not meant to be reliable. In practice CPython reliably calls it cuz it reference counts. So people know about it and use it (though I've only really seen it used for best effort cleanup checks)
In a world where more people were using PyPy we could have pressure from that perspective to avoid leaning into it. And that would also generate more pressure to implement code that is performant in "any" system.
> In practice CPython reliably calls it cuz it reference counts ... In a world where more people were using PyPy we could have pressure from that perspective to avoid leaning into it
A big part of the problem is that much of the power of the Python ecosystem comes specifically from extensions/bindings written in languages with manual (C) or RAII/ref-counted (C++, Rust) memory management, and having predictable Python-level cleanup behavior can be pretty necessary to making cleanup behavior in bound C/C++/Rust objects work. Breaking this behavior or causing too much of a performance hit is basically a non-starter for a lot of Python users, even if doing so would improve the performance of "pure" Python programs.
> meaning that code cannot have any visibility into deallocations.
This is more pedantry than a serious question. JavaScript has WeakReference, sure it'd be cumbersome and inefficient because you'd need to manually make and poll each thing you wanted to observe, but could it not be said that it does provide a view on deallocations?
> However, I misunderstood and came up with an even more extreme version: instead of tracing versions of normal instructions, I had only one instruction responsible for tracing, and all instructions in the second table point to that. Yes I know this part is confusing, I’ll hopefully try to explain better one day. This turned out to be a really really good choice. I found that the initial dual table approach was so much slower due to a doubling of the size of the interpreter, causing huge compiled code bloat, and naturally a slowdown.
> By using only a single instruction and two tables, we only increase the interpreter by a size of 1 instruction, and also keep the base interpreter ultra fast. I affectionally call this mechanism dual dispatch.
I really do hope they'll write that better explanation one day because this sounds pretty intriguing all on its own.
The guy said he hopes the free-threaded build'll be the only one in "3.16 or 3.17", I wonder if that should apply to the JIT too or how the JIT and interpreter interact.
I continue to believe that free-threading hurts performance more than it helps and Python should abandon it.
Having to have thread safe code all over the place just for the 1% of users who need to have multi-threading in Python and can't use subinterpreters for some reason is nuts.
> Having to have thread safe code all over the place just for the 1% of users who need to have multi-threading in Python and can't use subinterpreters for some reason is nuts.
Way more than 1% of the community, particularly of the community actively developing Python, wants free-threaded. The problem here is that the Python community consists of several different groups:
1. Basically pure Python code with no threading
2. Basically pure Python with appropriate thread safety
3. Basically pure Python code with already broken threaded code, just getting lucky for now
4. Mixed Python and C/C++/Rust code, with appropriate threading behavior in the C or C++ components
5. Mixed Python and C or C++ code, with C and C++ components depending on GIL behavior
Group 1 gets a slightly reduced performance. Groups 2 and 4 get a major win with free-threaded Python, being able to use threading through their interfaces to C/C++/Rust components. Group 3 is already writing buggy code and will probably see worse consequences from their existing bugs. Group 5 will have to either avoid threading in their Python code or rewrite their C/C++ components.
Right now, a big portion of the Python language developer base consists of Groups 2 and 4. Group 5 is basically perceived as holding Python-the-language and Python-the-implementations back.
Pure Python code always needed mutexes for thread safety with or without ol' GIL. I thought the difficulty with removing the GIL instead had to do with C extensions that rely on it.
I don't want to go too heavy on the negatives, but what's nuts is Python going for trust-the-programmer style multithreading. The risk is that extension modules could cause a lot of crashes.
I also wonder how many people actually need free-threading. And I wonder how useful it will be, when you can already use the ABI to call multi-threaded code.
I think the GIL provides python with a great guarantee, I would probably prefer single-thread performance improvements over multithreading in python to be honest.
Anyway if I need performance, Python would probably not be my first choice
As far as I know, PyPy doesn't support all CPython extensions, so pure Python code will probably (very likely) run fine but for other things most bets are off. I believe PyPy also only supports up to 3.11?
A lot of Python code still leans on CPython internals, C extensions, debuggers, or odd platform behavior, so PyPy works until some dependency or tool turns that gap into a support problem.
The JIT helps on hot loops, but for mixed workloads the warmup cost and compatibility tax are enough to keep most teams on the interpreter their deps target first.
Why shouldn't the reference implementation get JIT? Just because some other implementations already have it is no reason not to. That'd be like skipping list comprehensions because they already exist in CPython.
Because the same people who made a big deal about supporting PyPy and PEP 399 when it was fashionable to do so are now told by their corporations that PyPy does not matter. CPython only moves with what is currently fashionable, employer mandated and profitable.
PyPy is limited to maintenance mode due to a lack of funding/contributors. In the past, I think a few contributors or funding is what helped push "minor" PyPy versions. It's too bad PyPy couldn't take the federal funding the PSF threw away.
Great to see this going, Python also deserves a JIT, and given that only few bother with PyPy or GraalPy, shipping into the CPYthon is the only way to have less "rewrite into XYZ".
Thanks for all the amazing work! I have Noob question. Wouldn't this get the funding back? Or would that not be preferable way to continue(as opposed to just volunteer driven)?
Like this is a big deal to get a project to a state where volunteers are spun up and actively breaking tasks and getting work done, no? It's a python JIT something I know next to nothing about — as do most application developers — which tells one how difficult this must have been.
The funding was Microsoft employing most of the team. They were laid off (or at least, moved onto different projects), apparently because they weren't working on AI.
With Python being the main language for AI, isn't like more important to be more performant?
I kinda don't get Microsoft reasoning, maybe they're just tight in money
What is wrong with the Python code base that makes this so much harder to implement than seemingly all other code bases? Ruby, PHP, JS. They all seemed to add JITs in significantly less time. A Python JIT has been asked for for like 2 decades at this point.
The Python C api leaks its guts. Too much of the internal representation was made available for extensions and now basically any change would be guaranteed to break backwards compatibility with something.
For what it’s worth Ruby’s JIT took several different implementations, definitely struggled with Rails compatibility and literally used some people’s PhD research. It wasn’t a trivial affair
Some languages are much harder to compile well to machine code. Some big factors (for any languages) are things like: lack of static types and high "type uncertainty", other dynamic language features, established inefficient extension interfaces that have to be maintained, unusual threading models...
That makes sense if you're comparing with Java or C#, but not Ruby, which is way more dynamic than Python.
The more likely reason is that there simply hasn't been that big a push for it. Ruby was dog slow before the JIT and Rails was very popular, so there was a lot of demand and room for improvement. PHP was the primary language used by Facebook for a long time, and they had deep pockets. JS powers the web, so there's a huge incentive for companies like Google to make it faster. Python never really had that same level of investment, at least from a performance standpoint.
To your point, though, the C API has made certain types of optimizations extremely difficult, as the PyPy team has figured out.
The simplest JIT just generates the machine code instructions that the interpreter loop would execute anyway. It’s not an extremely difficult thing, but it also doesn’t give you much benefit.
A worthwhile JIT is a fully optimizing compiler, and that is the hard part. Language semantics are much less important - dynamic languages aren’t particularly harder here, but the performance roof is obviously just much lower.
I think that it's just that python people took the problem different, they made working with c and other languages better, and just made bindings for python and offloaded the performant code to these libraries. Ex: numpy
I can't really talk about Ruby. But PHP is much more static and surface of things you have to care about at runtime is like magnitude smaller and there already was opache as a starting point.
And speaking of something like JIT in V8 is of the most sophisticated and complicated ever built. There hasn't been near enough man hours and funding to cpython to make it fair comparison
For better or for worse they have been very consistent throughout the years that they don't want want to degrade existing performance. It is why the GIL existed for so long
That's a completely separate codebase that purposefully breaks backwards compatibility in specific areas to achieve their goals. That's not the same as having a first-class JIT in CPython, the actual Python implementation that ~everyone uses.
Yes, the graphs are incomprehensible because those are not defined in the article. They turn out to be different physical machines with different architectures: https://doesjitgobrrr.com/about
So the biggest gains so far are on Windows 11 Pro of (x86_64) ~20%? Is that because Windows was bad as a baseline (promethius)? It doesn't seem like the x86_64/Linux has improved as dramatically ~5% (ripley). I'm just surprised OS has that much of an effect that can be attributed to JIT vs other OS issues.
The immediate question has been answered, but what about the names? The latter three are obvious references to the Alien universe, but what relationship does blueberry have to them?
I always wanted this for Python but now that machines write code instead of humans I feel like languages like Python will not be needed as much anymore. They're made for humans, not machines. If a machine is going to do the dirty work I want it to produce something lean, fast, and strictly verified.
We got daguerrotypes, and then photographic film, and then digital cameras, along with image editing software, and now AI image generation systems; yet there are still people who go out and apply oil paints to a canvas with natural hair brushes. I'm not willing to lose that.
Pretty much my thoughts the other day... now that Codex does the writing, maybe I can finally switch to Go for the web backend stuff without being annoyed by some of its archaisms and gain significant execution performance, while still having a relatively easy to read language.
You ask a machine to write your code and you still care about being easy to read?
In my experience the people who care the most about code readability tend to be the people most opinionated on having the right abstractions, which are historically not available in Go.
If the speed of a car increases by 100% does that mean that it arrives at its destination before it left? No, it just means it took 50% of the time it would have otherwise.
But I do agree that it would be a bit clearer to talk in terms of time taken rather than speedup % i.e. instead of "20% slowdown to over 100% speedup" it's clearer to say "takes between 50% and 125% of the original time". (Especially since people very often say things like "3 times faster", which technically means 4 times as fast, when they should say "3 times as fast"; "takes 1/3 of the time" is unambiguous.)
They are all JIT on different architectures, measured relative to CPython. https://doesjitgobrrr.com/about: blueberry is aarch64 Raspberry Pi, ripley is x86_64 Intel, jones is aarch64 M3 Pro, prometheus is x86_64 AMD.
After installing all released versions of python locally, I can confirm that the `python3.15` command is not only very fast, but is guaranteed not to diverge!
$ time python3.15 -c "while True: print('hello world')"
bash: python3.15: command not found
real 0m0.005s
user 0m0.000s
sys 0m0.002s
I am trying to push back. I don't care if other people think the tools make them faster, I did not sign up to be a guinea pig for my employer or their AI-corp partner.
Python really needs to take the Typescript approach of "all valid Python4 is valid Python3". And then add value types so we can have int64 etc. And allow object refs to be frozen after instantiation to avoid the indirection tax.
Sensible type-annotated python code could be so much faster if it didn't have to assume everything could change at any time. Most things don't change, and if they do they change on startup (e.g. ORM bindings).
To clarify, it is nuts that in an object method, there is a performance enhancement through caching a member value.
This is not just a performance concern, this describes completely different behaviour. You forgot that self.x is just Class.__getattr__(self, 'x') and that you can implement __getattr__ how you like. There is no object identity across the values returned by __getattr__.
4 replies →
Java also has a performance cost to accessing class fields, as exampled by this (now-replaced) code in the JDK itself - https://github.com/openjdk/jdk/blob/jdk8-b120/jdk/src/share/...
13 replies →
> it is nuts that in an object method, there is a performance enhancement through caching a member value
i don't understand what you think is nuts about this. it's an interpreted language and the word `self` is not special in any way (it's just convention - you can call the first param to a method anything you want). so there's no way for the interpreter/compiler/runtime to know you're accessing a field of the class itself (let alone that that field isn't a computed property or something like that).
lots of hottakes that people have (like this one) are rooted in just a fundamental misunderstanding of the language and programming languages in general <shrugs>.
34 replies →
You mean even if x is not a property?
That was how the Mojo language started. And then soon after the hype they said that being a superset of Python was no longer the goal. Probably because being a superset of Python is not a guarantee for performance either.
Being a superset would mean all valid Python 3 is valid Python 4. A valuable property for sure, but not what OP suggested. In fact, it is the exact opposite.
But that's just not what python is for. Move your performance-critical logic into a native module.
Performance is one part of the discussion, but cleanliness is another. A Python4 that actually used typing in the interpreter, had value types, had a comptime phase to allow most metaprogramming to work (like monkey patching for tests) would be great! It would be faster, cleaner, easier to reason about, and still retain the great syntax and flexibility of the language.
4 replies →
I’ll be happy if over night all Python code in the world can reap 10-100x performance benefits without changing much of a codebase, you can continue having soup of multiple languages.
4 replies →
I have made some experiments with P2W, my experimental Python (subset) to WASM compiler. Initial figures are encouraging (5x speedup, on specific programs).
https://github.com/abilian/p2w
NB: some preliminary results:
> python code could be so much faster if it didn't have to assume everything could change at any time
Definitely, but then it wouldn't be Python. One of the core principles of Python's design is to be extremely dynamic, and that anything can change at any time.
There are many other, pretty good, strictly dynamically typed languages which work just as well if not better than Python, for many purposes.
I feel that this excuse is being trotted out too much. Most engineers never get to choose the programming language used for 90% of their professional projects.
And when Python is a mainstream language on top of which large, globally known websites, AI tools, core system utilities, etc are built, we should give up the purity angle and be practical.
Even the new performance push in Python land is a reflection of this. A long time ago some optimizations were refused in order to not complicate the default Python implementation.
7 replies →
> Python really needs to take the Typescript approach of "all valid Python4 is valid Python3"
It is called type hints, and is already there. TS typing doesn't bring any perf benefits over plain JS.
You really need dedicated types for `int64` and something like `final`. Consider:
there are multiple issues with Python that prevent optimizations:
* a user can define subtype `class my_int(int)`, so you cannot optimize the layout of `class Foo`
* the builtin `int` and `float` are big-int like numbers, so operations on them are branchy and allocating.
and the fact that Foo is mutable and that `id(foo.a)` has to produce something complicates things further.
4 replies →
>> Sensible type-annotated python code could be so much faster if it didn't have to assume everything could change at any time.
Then it wouldn't be Python any more.
Fine by me. I don't particularly like Python, but it's the defacto standard in my field so I have to use it (admittedly this is an improvement over a decade ago, when MATLAB was the defacto standard). I don't care about preserving the spirit of Python, I just care that the thing that bears the name Python meets my needs.
1 reply →
I share your view. Python's flexibility is central to Python.
Even type annotations, though useful, can get in the way for certain tasks.Betting on things like these to speed up things would be a mistake, since it would kind of force you to follow that style.
Anything that accelearates things should rely on run-time data, not on type annotations that won't change.
Isn't rpython doing that, allowing changes on startup and then it's basically statically typed? Does it still exist? Was it ever production ready? I only once read a paper about it decades ago.
It exists in the sense that PyPy exists.
As far as I can tell, it only ever existed to make PyPy possible, and was only defined/specified in terms of PyPy's needs.
RPython is great, but it changes semantics in all sorts of ways. No sets for example. WTF? The native Set type is one of the best features of Python. Tuples also get mangled in RPython.
I think sadly a lot of Python in the wild relies heavily, somewhere, on the crazy unoptimisable stuff. For example pytest monkey patches everything everywhere all the time.
You could make this clean break and call it Python 4 but frankly I fear it won't be Python anymore.
As a person who has spent a lot of time with pytest, I'm ready for testing framework that doesn't do any of that non-obvious stuff. Generally use unittest as much as I can these days, so much less _wierd_ about how it does things. Like jeeze pytest, do you _really_ need to stress test every obscure language feature? Your job is to call tests.
1 reply →
If you do that you then have a less productive language for many use cases IMHO.
All the dynamism from Python should stay where it is.
Just JIT and remember a type maybe, but do not force a type from a type hint or such things.
As a minimum, I would say not relying on that is the correct thing. You could exploit it, but not force it to change the semantics.
1 reply →
Allowing metaprogramming at module import (or another defined phase) would cover most monkey patching use cases. From __future__ import python4 would allow developers to declare their code optimisable.
Perl 6 showed what happens when you do something like that.
SPy [1] is a new attempt at something like this.
TL;DR: SPy is a variant of Python specifically designed to be statically compilable while retaining a lot of the "useful" dynamic parts of Python.
The effort is led by Antonio Cuni, Principal Software Engineer at Anaconda. Still very early days but it seems promising to me.
[1] https://github.com/spylang/spy
Thank you! Spy looks brilliant, especially the comptime-like freezing after import.
> Python really needs to take the Typescript approach of "all valid Python4 is valid Python3
Great idea, but I'm not convinced that they learned anything from the Python 2 to 3 transition, so I wouldn't hold my breath.
If you want a language system without contempt for backward compatibility, you're probably better off with Java/C++/JavaScript/etc. (though using JS libraries is like building on quicksand.) Bit of a shame since I want to like Python/Rust/Swift/other modern-ish languages, but it turns out that formal language specifications were actually a pretty good idea. API stability is another.
is that you, python core dev team? ;-)
1 reply →
There will be not Python 4, and 3.X policy requires forward compat, so we are already there.
Oh, and while we're at it, fix the "empty array is instantiated at parse time so all your functions with a default empty array argument share the same object" bullshit.
We don't call them "arrays".
It has nothing to do with whether the list is empty. It has nothing to do with lists at all. It's the behaviour of default arguments.
It happens at the time that the function object is created, which is during runtime.
You only notice because lists are mutable. You should already prefer not to mutate parameters, and it especially doesn't make sense to mutate a parameter that has a default value because the point of mutating parameters is that the change can be seen by the caller, but a caller that uses a default value can't see the default value.
The behaviour can be used intentionally. (I would argue that it's overused intentionally; people use it to "bind" loop variables to lambdas when they should be using `functools.partial`.)
If you're getting got by this, you're fundamentally expecting Python to work in a way that Pythonistas consider not to make sense.
2 replies →
Execution time, not parse time. It's a side effect of function declarations being statements that are executed, not the list/dict itself. It would happen with any object.
15 replies →
If you change this you break a common optimization:
https://github.com/python/cpython/blob/3.14/Lib/json/encoder...
Default value is evaluated once, and accessing parameter is much cheaper than global
there is PEP 671 for that, which introduces extra syntax for the behavior you want. people rely on the current behavior so you can't really change it
I went sort of this route in an experiment with Claude.. I really want Python for .NET but I said, damn the expense, prioritize .NET compatibility, remove anything that isn't supported feasably. It means 0 python libs, but all of NuGet is supported. The rules are all signatures need types, and if you declare a type, it is that type, no exceptions, just like in C# (if you squint when looking at var in a funny way). I wound up with reasonable results, just a huge trade of the entire Python ecosystem for .NET with an insanely Python esque syntax.
Still churning on it, will probably publish it and do a proper blog post once I've built something interesting with the language itself.
IronPython -> TitaniumPython?
I'm been occasionally glancing at PR/issue tracker to keep up to date with things happening with the JIT, but I've never seen where the high level discussions were happening; the issues and PRs always jumped right to the gritty details. Is there anywhere a high-level introduction/example of how trace projection vs recording work and differ? Googling for the terms often returns CPython issue tracker as the first result, and repo's jit.md is relatively barebones and rarely updated :(
Similarly, I don't entirely understand refcount elimination; I've seen the codegen difference, but since the codegen happens at build time, does this mean each opcode is possibly split into two (or more?) stencils, with and without removed increfs/decrefs? With so many opcodes and their specialized variants, how many stencils are there now?
> I've never seen where the high level discussions were happening
Thanks for your interest. This is something we could improve on. We were supposed to document the JIT better in 3.15, but right now we're crunching for the 3.15 release. I'll try to get to updating the docs soon if there's enough interest. PEP 744 does not document the new frontend.
I wrote a somewhat high-level overview here in a previous blog post https://fidget-spinner.github.io/posts/faster-jit-plan.html#...
> does this mean each opcode is possibly split into two (or more?) stencils, with and without removed increfs/decrefs?
This is a great question, the answer is not exactly! The key is to expose the refcount ops in the intermediate representation (IR) as one single op. For example, BINARY_OP becomes BINARY_OP, POP_TOP (DECREF), POP_TOP (DECREF). That way, instead of optimizing for n operations, we just need to expose refcounting of n operations and optimize only 1 op (POP_TOP). Thus, we just need to refactor the IR to expose refcounting (which was the work I divided up among the community).
If you have any more questions, I'm happy to answer them either in public or email.
I saw your documentation PR, thank you!
I also did some reading and experiments, so quickly talking about things I've found out re: refcount elimination:
Previously given an expression `c = a + b`, the compiler generated a sequence of two LOADs (that increment the inputs' refcounts), then BINARY_OP that adds the inputs and decrements the refcounts afterwards (possibly deallocating the inputs).
But if the optimizer can prove that the inputs definitely will have existing references after the addition finishes (like when `a` and `b` are local variables, or if they are immortals like `a+5`), then the entire incref/decref pair could be ignored. So in the new version, the DECREFs part of the BINARY_OP was split into separate uops, which are then possibly transformed into POP_TOP_NOP by the optimizer.
And I'm assuming that although normally splitting an op this much would usually cost some performance (as the compiler can't optimize them as well anymore), in this case it's usually worth it as the optimization almost always succeeds, and even if it doesn't, the uops are still generated in several variants for various TOS cache (which is basically registers) states so they still often codegen into just 1-2 opcodes on x86.
One thing I don't entirely understand, but that's super specific from my experiment, not sure if it's a bug or special case: I looked at tier2 traces for `for i in lst: (-i) + (-i)`, where `i` is an object of custom int-like class with overloaded methods (to control which optimizations happen). When its __neg__ returns a number, then I see a nice sequence of
_POP_TOP_INT_r32, _r21, _r10.
But when __neg__ returns a new instance of the int-like class, then it emits
_SPILL_OR_RELOAD_r31, _POP_TOP_r10, _SPILL_OR_RELOAD_r01, _POP_TOP_r10, etc.
Is there some specific reason why the "basic" pop is not specialized for TOS cache? Is it because it's the same opcode as in tier1, and it's just not worth it as it's optimized into specialized uops most of the time; or is it that it can't be optimized the same way because of the decref possibly calling user code?
Update: I put up a PR to document the trace recording interpreter https://github.com/python/cpython/pull/146110
You’ll probably want to look to the PEPs. Havent dug into this topic myself but looks related https://peps.python.org/pep-0744/
I think CPython already had tier2 and some tracing infrastructure when the copy-and-patch JIT backend was added; it's the "JIT frontend" that's more obscure to me.
discussions might be happening on the Python forums, which are pretty active.
https://discuss.python.org/t/pep-744-jit-compilation/50756/8... here's one thing
I do think you can also just outright ask questions about it on the forums and you'll get some answers.
At the end of the day there's only so many people working on this though.
have you read the dev mailing list? There the developers of python discuss lots.
There isn’t a dev mailing list any more, is there? Do you mean the Discord forum?
UPDATE: I misunderstood the question :-/ You can ignore this.
I love playing with compilers for fun, so maybe I can shed some light. I’ll explain it in a simplified way for everyone’s benefit (going to ignore the stack):
When an object is passed between functions in Python, it doesn’t get copied. Instead, a reference to the object’s memory address is sent. This reference acts as a pointer to the object’s data. Think of it like a sticky note with the object’s memory address written on it. Now, imagine throwing away one sticky note every time a function that used a reference returns.
When an object has zero references, it can be freed from memory and reused. Ensuring the number of references, or the “reference count” is always accurate is therefore a big deal. It is often the source of memory leaks, but I wouldn’t attribute it to a speed up (only if it replaces GC, then yes).
what at all does this comment have to do with what it's replying to?
1 reply →
Oh man, Python 2 > 3 was such a massive shift. Took almost half a decade if not more and yet it mainly changing superficial syntax stuff. They should have allowed ABIs to break and get these internal things done. Probably came up with a new, tighter API for integrating with other lower level languages so going forward Python internals can be changed more freely without breaking everything.
The text encoding stuff wasn't a small change considering what it could break, at least. And remember we're sometimes talking about software that would cost a lot of money to migrate or upgrade. I still maintain some 2.x python code-bases that will be very expensive to migrate and the customer is not willing to invest that money.
Although your general sentiment is something I agree with(if it's going to be painful do it and get it over with), I don't believe anybody knew or could've guessed what the reaction of the ecosystem would be.
Your last point about being able to change internals more freely is also great in theory but very difficult(if not impossible) to achieve in practice.
I don't know. Having maintained some small projects that were free and open source, I saw the hostility and entitlement that can come from that position. And those projects were a spec of dust next to something like Python. So I think the core team is doing the best they can. It was always going to be damned if you do, damned if you don't.
> I still maintain some 2.x python code-bases that will be very expensive to migrate and the customer is not willing to invest that money.
Slight tangent: if Claude can decimate IBM stock price by migrating off Cobol for cheap, surely we can do Python 2 to 3 now, too?
About the internals: we sort of missed an opportunity there, but back then there also didn't quite know what they were doing (or at least we have better ideas of what's useful today). And making the step from 2 to 3 even bigger might have been a bad idea?
13 replies →
I cannot believe people are still acting like Python 2->3 was a huge fuck-up and an enormous missed opportunity. When in reality Python is by most measures the most popular language and became so AFTER that switch.
Since the switch we have seen enormous companies being built from scratch. There is no reason for anyone to be complaining about it being too hard to upgrade in 2026
Living through it... Python 3 made a lot of changes for the better but 3.0 in particular included a bunch of unforced errors that made it too hard for people to upgrade in one go.
It wasn't until much later (I would say 3.4 or 3.5?) that we had good tooling to allow for migrating from Python 2 to Python 3 gradually, which is what most tools needed to do.
The final thing that made Python upgrading easy was making a bunch of changes (along with stuff like six) so that you could write code that would run identically in Python 2 and Python 3. That lets you do refactors over time, little cleanups, and not have the huge "move to Python 3" commit.
> Python is by most measures the most popular language and became so AFTER that switch
The switch had nothing to do with Python's rise in popularity though, it was because of NumPy and later PyTorch being adopted by data scientist and later machine learning tasks that themselves became very popular. Python's popularity rose alongside those.
> There is no reason for anyone to be complaining about it being too hard to upgrade in 2026
The "complaints" are about unnecessary and pointless breakage, that was very difficult for many codebases to upgrade for years. That by now most of these codebases have been either abandoned, upgraded or decided to stick with Python2 until the end of time doesn't mean these pains didn't happen nor that the language's developers inflicting them to their users were a good idea because some largely unrelated external factors made the language popular several years later.
1 reply →
It took a long time for python 3 to add the necessary backwards compatibility features to allow people to switch over. Once they did it was fine, but it was a massive fuck up until then. The migration took far longer than it should have done
Its widely regarded as a disaster for good reason, that forced some corrections in python to fix it. Just because its fine now, does not mean it was always fine
1 reply →
Those are unrelated.
The biggest (and worst planned) change was module names. Your imports didn't work, forcing hacks like
Or worse, people used try/except in their imports.
still GIL
Opt-in starting from 3.15, or am I mistaken?
Anyway you can already try freethreaded builds that have the GIL disabled, but my experience is that most of your dependencies won't work.
yes. it was not a massive shift. it was barely worth the effort.
The Python devs didn’t want to make huge changes because they were worried Python 3 would end up taking forever like Perl 6. Instead they went to the other extreme and broke everyone’s code for trivial reasons and minimal benefit, which meant no-one wanted to upgrade.
Even the main driver for Python 3, the bytes-Unicode split, has unfortunately turned out to be sub-optimal. Python essentially bet on UTF-32 (with space-saving optimisations), while everyone else has chosen UTF-8.
13 replies →
this must be right, i'm getting downvoted
2 replies →
I'm curious is the JIT developers could mention any Python features that prevent promising JIT features. An earlier Ken Jin blog [1], mentions how __del__ complicates reference counting optimization.
There is a story that Python is harder to optimize than, say, Typescript, with Python flexibility and the C API getting mentioned. Maybe, if the list of troublesome Python features was out there, programmers could know to avoid those features with the promise of activating the JIT when it can prove the feature is not in use. This could provide a way out of the current Python hard-to-JIT trap. It's just a gist of an idea, but certainly an interesting first step would be to hear from the JIT people which Python features they find troublesome.
[1] https://fidget-spinner.github.io/posts/faster-jit-plan.html
It's interesting you mention __del__ because Javascript not only doesn't have destructors but for security reasons (that are above my pay grade) but the spec _explicitly prohibits_ implementations from allowing visibility into garbage collection state, meaning that code cannot have any visibility into deallocations.
I think __del__ is tricky though. In theory __del__ is not meant to be reliable. In practice CPython reliably calls it cuz it reference counts. So people know about it and use it (though I've only really seen it used for best effort cleanup checks)
In a world where more people were using PyPy we could have pressure from that perspective to avoid leaning into it. And that would also generate more pressure to implement code that is performant in "any" system.
> In practice CPython reliably calls it cuz it reference counts ... In a world where more people were using PyPy we could have pressure from that perspective to avoid leaning into it
A big part of the problem is that much of the power of the Python ecosystem comes specifically from extensions/bindings written in languages with manual (C) or RAII/ref-counted (C++, Rust) memory management, and having predictable Python-level cleanup behavior can be pretty necessary to making cleanup behavior in bound C/C++/Rust objects work. Breaking this behavior or causing too much of a performance hit is basically a non-starter for a lot of Python users, even if doing so would improve the performance of "pure" Python programs.
3 replies →
> code cannot have any visibility into deallocations
Doesn't FinalizationRegistry let you do exactly that?
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...
5 replies →
> meaning that code cannot have any visibility into deallocations.
This is more pedantry than a serious question. JavaScript has WeakReference, sure it'd be cumbersome and inefficient because you'd need to manually make and poll each thing you wanted to observe, but could it not be said that it does provide a view on deallocations?
1 reply →
The biggest thing is BigInt by default. It makes every integer operation require an overflow check.
JS (when using ints, which v8 does) is the same in this respect.
Huh, I could imagine that as a set of Ruff rules:
> Using str.frobnicate prevents TurboJit on line 63
> However, I misunderstood and came up with an even more extreme version: instead of tracing versions of normal instructions, I had only one instruction responsible for tracing, and all instructions in the second table point to that. Yes I know this part is confusing, I’ll hopefully try to explain better one day. This turned out to be a really really good choice. I found that the initial dual table approach was so much slower due to a doubling of the size of the interpreter, causing huge compiled code bloat, and naturally a slowdown.
> By using only a single instruction and two tables, we only increase the interpreter by a size of 1 instruction, and also keep the base interpreter ultra fast. I affectionally call this mechanism dual dispatch.
I really do hope they'll write that better explanation one day because this sounds pretty intriguing all on its own.
> We don’t have proper free-threading support yet, but we’re aiming for that in 3.15/3.16. The JIT is now back on track.
I recently read an interview about implementing free-threading and getting modifications through the ecosystem to really enable it: https://alexalejandre.com/programming/interview-with-ngoldba...
The guy said he hopes the free-threaded build'll be the only one in "3.16 or 3.17", I wonder if that should apply to the JIT too or how the JIT and interpreter interact.
I continue to believe that free-threading hurts performance more than it helps and Python should abandon it.
Having to have thread safe code all over the place just for the 1% of users who need to have multi-threading in Python and can't use subinterpreters for some reason is nuts.
> Having to have thread safe code all over the place just for the 1% of users who need to have multi-threading in Python and can't use subinterpreters for some reason is nuts.
Way more than 1% of the community, particularly of the community actively developing Python, wants free-threaded. The problem here is that the Python community consists of several different groups:
1. Basically pure Python code with no threading
2. Basically pure Python with appropriate thread safety
3. Basically pure Python code with already broken threaded code, just getting lucky for now
4. Mixed Python and C/C++/Rust code, with appropriate threading behavior in the C or C++ components
5. Mixed Python and C or C++ code, with C and C++ components depending on GIL behavior
Group 1 gets a slightly reduced performance. Groups 2 and 4 get a major win with free-threaded Python, being able to use threading through their interfaces to C/C++/Rust components. Group 3 is already writing buggy code and will probably see worse consequences from their existing bugs. Group 5 will have to either avoid threading in their Python code or rewrite their C/C++ components.
Right now, a big portion of the Python language developer base consists of Groups 2 and 4. Group 5 is basically perceived as holding Python-the-language and Python-the-implementations back.
3 replies →
Maybe they could have two versions of the interpreter, one that’s thread-safe and one that’s optimised for single-threading?
Microsoft used to do this for their C runtime library.
2 replies →
Pure Python code always needed mutexes for thread safety with or without ol' GIL. I thought the difficulty with removing the GIL instead had to do with C extensions that rely on it.
2 replies →
I don't want to go too heavy on the negatives, but what's nuts is Python going for trust-the-programmer style multithreading. The risk is that extension modules could cause a lot of crashes.
1 reply →
I also wonder how many people actually need free-threading. And I wonder how useful it will be, when you can already use the ABI to call multi-threaded code.
I think the GIL provides python with a great guarantee, I would probably prefer single-thread performance improvements over multithreading in python to be honest.
Anyway if I need performance, Python would probably not be my first choice
Doesn't PyPy already have a jit compiler? Why aren't we using that?
As far as I know, PyPy doesn't support all CPython extensions, so pure Python code will probably (very likely) run fine but for other things most bets are off. I believe PyPy also only supports up to 3.11?
PyPy isn't CPython.
A lot of Python code still leans on CPython internals, C extensions, debuggers, or odd platform behavior, so PyPy works until some dependency or tool turns that gap into a support problem.
The JIT helps on hot loops, but for mixed workloads the warmup cost and compatibility tax are enough to keep most teams on the interpreter their deps target first.
Why shouldn't the reference implementation get JIT? Just because some other implementations already have it is no reason not to. That'd be like skipping list comprehensions because they already exist in CPython.
Because the same people who made a big deal about supporting PyPy and PEP 399 when it was fashionable to do so are now told by their corporations that PyPy does not matter. CPython only moves with what is currently fashionable, employer mandated and profitable.
PyPy is limited to maintenance mode due to a lack of funding/contributors. In the past, I think a few contributors or funding is what helped push "minor" PyPy versions. It's too bad PyPy couldn't take the federal funding the PSF threw away.
> It's too bad PyPy couldn't take the federal funding the PSF threw away.
The PSF is primarily a political advocacy organisation, so it wouldn't make sense for them to use the money for Python.
Because PyPy seems to be defunct. It hasn't updated for quite a while.
See https://github.com/numpy/numpy/issues/30416 for example. It's not being updated for compatibility with new versions of Python.
PyPy's devs disagree: https://news.ycombinator.com/item?id=47293415
[flagged]
9 replies →
Great to see this going, Python also deserves a JIT, and given that only few bother with PyPy or GraalPy, shipping into the CPYthon is the only way to have less "rewrite into XYZ".
Kudos to those involved into making it happen.
[dead]
[dead]
Thanks for all the amazing work! I have Noob question. Wouldn't this get the funding back? Or would that not be preferable way to continue(as opposed to just volunteer driven)?
Like this is a big deal to get a project to a state where volunteers are spun up and actively breaking tasks and getting work done, no? It's a python JIT something I know next to nothing about — as do most application developers — which tells one how difficult this must have been.
> Wouldn't this get the funding back?
The funding was Microsoft employing most of the team. They were laid off (or at least, moved onto different projects), apparently because they weren't working on AI.
With Python being the main language for AI, isn't like more important to be more performant? I kinda don't get Microsoft reasoning, maybe they're just tight in money
2 replies →
It looks like ARM picked up plenty of those folk and pays them to continue this work.
What is wrong with the Python code base that makes this so much harder to implement than seemingly all other code bases? Ruby, PHP, JS. They all seemed to add JITs in significantly less time. A Python JIT has been asked for for like 2 decades at this point.
The Python C api leaks its guts. Too much of the internal representation was made available for extensions and now basically any change would be guaranteed to break backwards compatibility with something.
It's a shame that Python 2->3 transition was so painful, because Python could use a few more clean breaks with the past.
This would be a potential case for a new major version number.
11 replies →
Ooo this makes sense it's like if the Linux had don't break users space AND a whole bunch of other purely internal APIs you also can't refactor.
For what it’s worth Ruby’s JIT took several different implementations, definitely struggled with Rails compatibility and literally used some people’s PhD research. It wasn’t a trivial affair
Some languages are much harder to compile well to machine code. Some big factors (for any languages) are things like: lack of static types and high "type uncertainty", other dynamic language features, established inefficient extension interfaces that have to be maintained, unusual threading models...
That makes sense if you're comparing with Java or C#, but not Ruby, which is way more dynamic than Python.
The more likely reason is that there simply hasn't been that big a push for it. Ruby was dog slow before the JIT and Rails was very popular, so there was a lot of demand and room for improvement. PHP was the primary language used by Facebook for a long time, and they had deep pockets. JS powers the web, so there's a huge incentive for companies like Google to make it faster. Python never really had that same level of investment, at least from a performance standpoint.
To your point, though, the C API has made certain types of optimizations extremely difficult, as the PyPy team has figured out.
3 replies →
The simplest JIT just generates the machine code instructions that the interpreter loop would execute anyway. It’s not an extremely difficult thing, but it also doesn’t give you much benefit.
A worthwhile JIT is a fully optimizing compiler, and that is the hard part. Language semantics are much less important - dynamic languages aren’t particularly harder here, but the performance roof is obviously just much lower.
2 replies →
I think that it's just that python people took the problem different, they made working with c and other languages better, and just made bindings for python and offloaded the performant code to these libraries. Ex: numpy
I can't really talk about Ruby. But PHP is much more static and surface of things you have to care about at runtime is like magnitude smaller and there already was opache as a starting point. And speaking of something like JIT in V8 is of the most sophisticated and complicated ever built. There hasn't been near enough man hours and funding to cpython to make it fair comparison
For better or for worse they have been very consistent throughout the years that they don't want want to degrade existing performance. It is why the GIL existed for so long
I thought php hasn't shipped jit yet (as in its behind a disabled by default config)
PHP 8 shipped with JIT on by default unless I'm mistaken.
1 reply →
PHP and JS had huge tech companies pouring resources into making them fast.
Are you forgetting about PyPy, which has existed for almost 2 decades at this point?
That's a completely separate codebase that purposefully breaks backwards compatibility in specific areas to achieve their goals. That's not the same as having a first-class JIT in CPython, the actual Python implementation that ~everyone uses.
1 reply →
Money.
(what are blueberry, ripley, jones and prometheus?)
Yes, the graphs are incomprehensible because those are not defined in the article. They turn out to be different physical machines with different architectures: https://doesjitgobrrr.com/about
The names of the benchmark runners. https://doesjitgobrrr.com/about
So the biggest gains so far are on Windows 11 Pro of (x86_64) ~20%? Is that because Windows was bad as a baseline (promethius)? It doesn't seem like the x86_64/Linux has improved as dramatically ~5% (ripley). I'm just surprised OS has that much of an effect that can be attributed to JIT vs other OS issues.
1 reply →
The immediate question has been answered, but what about the names? The latter three are obvious references to the Alien universe, but what relationship does blueberry have to them?
I assume Blueberry is a nod to the machine being a Raspberry Pi.
I always wanted this for Python but now that machines write code instead of humans I feel like languages like Python will not be needed as much anymore. They're made for humans, not machines. If a machine is going to do the dirty work I want it to produce something lean, fast, and strictly verified.
> now that machines write code instead of humans
That is not remotely the case for anyone who produces quality work.
Look again.
If you care about quality you absolutely can guide a machine to produce that for you without writing a single line of code yourself.
And I expect the amount of guidance needed will continue to drop.
We got daguerrotypes, and then photographic film, and then digital cameras, along with image editing software, and now AI image generation systems; yet there are still people who go out and apply oil paints to a canvas with natural hair brushes. I'm not willing to lose that.
AI, write me that sqlalchemy clone in <lang>
Pretty much my thoughts the other day... now that Codex does the writing, maybe I can finally switch to Go for the web backend stuff without being annoyed by some of its archaisms and gain significant execution performance, while still having a relatively easy to read language.
You ask a machine to write your code and you still care about being easy to read?
In my experience the people who care the most about code readability tend to be the people most opinionated on having the right abstractions, which are historically not available in Go.
4 replies →
I have shifted as much as I can python to go when I don’t code. It’s just faster and the compiler catches more errors, win win,
1 reply →
Over 100% speedup sound like "the code compiled before you asked the compiler to start working".
`from future import time_travel`
If the speed of a car increases by 100% does that mean that it arrives at its destination before it left? No, it just means it took 50% of the time it would have otherwise.
But I do agree that it would be a bit clearer to talk in terms of time taken rather than speedup % i.e. instead of "20% slowdown to over 100% speedup" it's clearer to say "takes between 50% and 125% of the original time". (Especially since people very often say things like "3 times faster", which technically means 4 times as fast, when they should say "3 times as fast"; "takes 1/3 of the time" is unambiguous.)
Sorry but the graphs are completely unreadable. There are four code names for each of the lines. Which is jit and which is cpython?
They are all JIT on different architectures, measured relative to CPython. https://doesjitgobrrr.com/about: blueberry is aarch64 Raspberry Pi, ripley is x86_64 Intel, jones is aarch64 M3 Pro, prometheus is x86_64 AMD.
Thanks
[dead]
[dead]
[flagged]
Jumping 12 major versions to one that doesn't exist yet must yield quite the performance boost.
After installing all released versions of python locally, I can confirm that the `python3.15` command is not only very fast, but is guaranteed not to diverge!
[dead]
[flagged]
[flagged]
[flagged]
Reference counting is not a strict requirement for python. Certainly not accurate counting.
[flagged]
Wait is this real? Does it mean this person read it or the bot read it, I don't think this is moltbook if the latter
1 reply →
[flagged]
[flagged]
I am trying to push back. I don't care if other people think the tools make them faster, I did not sign up to be a guinea pig for my employer or their AI-corp partner.