← Back to context

Comment by ecshafer

5 days ago

What is wrong with the Python code base that makes this so much harder to implement than seemingly all other code bases? Ruby, PHP, JS. They all seemed to add JITs in significantly less time. A Python JIT has been asked for for like 2 decades at this point.

The Python C api leaks its guts. Too much of the internal representation was made available for extensions and now basically any change would be guaranteed to break backwards compatibility with something.

  • It's a shame that Python 2->3 transition was so painful, because Python could use a few more clean breaks with the past.

    This would be a potential case for a new major version number.

  • Ooo this makes sense it's like if the Linux had don't break users space AND a whole bunch of other purely internal APIs you also can't refactor.

For what it’s worth Ruby’s JIT took several different implementations, definitely struggled with Rails compatibility and literally used some people’s PhD research. It wasn’t a trivial affair

Some languages are much harder to compile well to machine code. Some big factors (for any languages) are things like: lack of static types and high "type uncertainty", other dynamic language features, established inefficient extension interfaces that have to be maintained, unusual threading models...

  • That makes sense if you're comparing with Java or C#, but not Ruby, which is way more dynamic than Python.

    The more likely reason is that there simply hasn't been that big a push for it. Ruby was dog slow before the JIT and Rails was very popular, so there was a lot of demand and room for improvement. PHP was the primary language used by Facebook for a long time, and they had deep pockets. JS powers the web, so there's a huge incentive for companies like Google to make it faster. Python never really had that same level of investment, at least from a performance standpoint.

    To your point, though, the C API has made certain types of optimizations extremely difficult, as the PyPy team has figured out.

    • > Python never really had that same level of investment, at least from a performance standpoint.

      Or lack of incentive?

      Alot of big python projects that does machine learning and data processing offloads the heavy data processing from pure python code to libraries like numpy and pandas that take advantage of C api binding to do native execution.

    • Google, Dropbox, and Microsoft from what I can recall all tried to make Python fast so I don’t buy the “hasn’t seen a huge amount of investment”. For a long time Guido was opposed to any changes and that ossified the ecosystem.

      But the main problem was actually that pypy was never adopted as “the JIT” mechanism. That would have made a huge difference a long time ago and made sure they evolved in lock step.

      1 reply →

  • The simplest JIT just generates the machine code instructions that the interpreter loop would execute anyway. It’s not an extremely difficult thing, but it also doesn’t give you much benefit.

    A worthwhile JIT is a fully optimizing compiler, and that is the hard part. Language semantics are much less important - dynamic languages aren’t particularly harder here, but the performance roof is obviously just much lower.

    • Agree re: different types of JITs producing wildly different results but don't agree about language semantics - even a Java JIT has to give up speed due to certain seemingly minor language and JVM issues. So both matter - no matter how good of a compiler engineer you are, some semantics are just not optimizable. Indeed, the use of a "trace JITs" is a proof of that.

      1 reply →

  • I think that it's just that python people took the problem different, they made working with c and other languages better, and just made bindings for python and offloaded the performant code to these libraries. Ex: numpy

I can't really talk about Ruby. But PHP is much more static and surface of things you have to care about at runtime is like magnitude smaller and there already was opache as a starting point. And speaking of something like JIT in V8 is of the most sophisticated and complicated ever built. There hasn't been near enough man hours and funding to cpython to make it fair comparison

For better or for worse they have been very consistent throughout the years that they don't want want to degrade existing performance. It is why the GIL existed for so long

Are you forgetting about PyPy, which has existed for almost 2 decades at this point?

  • That's a completely separate codebase that purposefully breaks backwards compatibility in specific areas to achieve their goals. That's not the same as having a first-class JIT in CPython, the actual Python implementation that ~everyone uses.

    • Definitely agree that it’s better to have JIT in the mainline Python, but it’s not like there weren’t options if you needed higher performance before.

      Including simply implementing the slow parts in C, such as the high performance machine learning ecosystem that exists in Python.