← Back to context

Comment by mattclarkdotnet

5 days ago

To clarify, it is nuts that in an object method, there is a performance enhancement through caching a member value.

  class SomeClass
    def init(self)
      self.x = 0
    def SomeMethod(self)
      q = self.x
      ## do stuff with q, because otherwise you're dereferencing self.x all the damn time

This is not just a performance concern, this describes completely different behaviour. You forgot that self.x is just Class.__getattr__(self, 'x') and that you can implement __getattr__ how you like. There is no object identity across the values returned by __getattr__.

  • This level of dynamism is commonly forgotten/omitted because it is most often not at all needed. "There is no object identity across the values [retrieved by self.x]" is a very curious choice to many.

    • It's very Pythonic to expose e.g. state via the existence of attributes. This also makes it possible to dynamically expose foreign language interfaces. You can really craft the interface you like, because the interface exposal is also normal code that returns strings and objects.

      You are right that it is not needed often, but there is often somewhere a part in the library stack that does exactly this, to expose a nice interface.

      2 replies →

Java also has a performance cost to accessing class fields, as exampled by this (now-replaced) code in the JDK itself - https://github.com/openjdk/jdk/blob/jdk8-b120/jdk/src/share/...

  • Any decent JIT compiler (and HotSpot's is world class) will optimize this out. Likely this was done very early on in development, or was just to reduce bytecode size to promote inlining heuristics that use it

    • String is also a pretty damn fundamental object, and I'm sure trim() calls are extremely common too. I wouldn't be surprised if making sure that seemingly small optimizations like this are applied in the interpreter before the JIT kicks are not premature optimizations in that context.

      There might be common scenarios where this had a real, significant performance impacts, E.G. use-cases where it's such a bottle-neck in the interpreter that it measurably affects warm-up time. Also, string manipulation seems like the kind of thing you see in small scripts that end before a JIT even kicks in but that are also called very often (although I don't know how many people would reach for Java in that case.

      EDIT: also, if you're a commercial entity trying to get people to use your programming language, it's probably a good idea to make the language perform less bad with the most common terrible code. And accidentally quadratic or worse string manipulation involving excessive calls to trim() seems like a very likely scenario in that context.

  • That was a niche optimization primarily targeting code at intepretor. Even the most basic optimizing compiler in HotSpot tiered compilation chain at that time (the client compiler or C1) would be able to optimize that into the register. Since String is such an important class, even small stuffs like this is done.

> it is nuts that in an object method, there is a performance enhancement through caching a member value

i don't understand what you think is nuts about this. it's an interpreted language and the word `self` is not special in any way (it's just convention - you can call the first param to a method anything you want). so there's no way for the interpreter/compiler/runtime to know you're accessing a field of the class itself (let alone that that field isn't a computed property or something like that).

lots of hottakes that people have (like this one) are rooted in just a fundamental misunderstanding of the language and programming languages in general <shrugs>.

  • If you dig into JS engine implementations they deal with a lot of the same sorts of things. Simple objects with straightforward properties are tagged such that they skip the dynamic machinery with fallback paths to deal with dynamism when it is necessary.

    A common approach is hidden classes that work much like classes in other languages. Reading a simple int property just reads bytes at an offset from the object pointer directly. Upon entry to the method bits of the object are tested and if the object is not known to be simple it escapes into the full dynamic machinery.

    I don't know if those exact techniques would work for Python but this is not an either-or situation.

    See also: modern Objective-C msg_Send which is so fast on modern hardware for the fast-path it is rarely a performance bottleneck. Despite being able to add dynamic subclasses or message forward at runtime.

  • What's nuts is that the language doesn't guarantee that successive references to the same member value within the same function body are stable. You can look it up once, go off and do something else, and look it up again and it's changed. It's dynamism taken to an unnecessary extreme. Nobody in the real world expects this behaviour. Making it just a bit less dynamic wouldn't change the fundamentals of the language but it would make it a lot more tractable.

    • > What's nuts is that the language doesn't guarantee that successive references to the same member value within the same function body are stable. You can look it up once, go off and do something else, and look it up again and it's changed.

      There is no such thing as 'successive references to the same member value' here. It's not that you look up the same object and it can change, it's that you are not referring to the same object at all.

      self.x is actually self.__getattr__('x'), which can in fact return a different thing each time. `self.x` IS a string lookup and that is not an implementation detail, but a major design goal. This is the dynamism, that is one of the selling points of Python, it allows you to change and modify interfaces to reflect state. It's nice for some things and it is what makes Python Python. If you don't want that, use another language.

      2 replies →

    • In Python attribute access aren't stable! `self.x` where `x` is a property is not guaranteed to refer to the same thing.

      And getting rid of descriptors would be a _fundamental change to the language_. An immeense one. Loads of features are built off of descriptors or descriptor-like things.

      And what you're complaining about is also not true in Javascript world either... I believe you can build descriptor-like things in JS now as well.

      _But_ if you want that you can use stuff like mypyc + annotations to get that for you. There are tools that let you get to where you want. Just not out of the box because Python isn't that language.

      Remember, this is a scripting language, not a compiled language. Every optimization for things you talk about would be paid on program load (you have pyc stuff but still..)

      Gotta show up with proof that what you're saying is verifiable and works well. Up until ~6 or 7 years ago CPython had a concept of being easy to onboard onto. Dataflow analyses make the codebase harder to deal with.

      Having said all of that.... would be nice to just inline RPython-y code and have it all work nicely. I don't need it on everything and proving safety is probably non-trivial but I feel like we've got to be closer to doing this than in the past.

      I ... think in theory the JIT can solve for that too. In theory

      4 replies →

    • > What's nuts is that the language doesn't guarantee that successive references to the same member value within the same function body are stable.

      The language supports multiple threads and doesn’t have private fields (https://docs.python.org/3/tutorial/classes.html#private-vari...), so the runtime cannot rule out that the value gets changed in-between.

      And yes, it often is obvious to humans that’s not intended to happen, and almost never what happens, but proving that is often hard or even impossible.

      4 replies →

    • > Nobody in the real world expects this behaviour.

      For example, numbers and strings are immutable objects in Python. If self.x is a number and its numeric value is changed by a method call, self.x will be a different object after that. I'd dare say people expect this to work.

    • basically all object oriented languages work like that. You access a member; you call a method which changes that member; you expect that change is visible lower in the code, and there're no statically computable guarantees that particular member is not touched in the called method (which is potentially shadowed in a subclass). It's not dynamism, even c++ works the same, it's an inherent tax on OOP. All you can do is try to minimize cost of that additional dereference. I'm not even touching threads here.

      now, functional languages don't have this problem at all.

      5 replies →

    • > same member value within the same function body are stable

      Did you miss the part where I explained to you there's no way to identify that it's a member variable?

      > Nobody in the real world expects this behaviour

      As has already been explained to you by a sibling comment you are in fact wrong and there are in fact plenty of people in the real world who do actually expect this behavior.

      So I'll repeat myself: lots of hottakes from just pure. Unadulterated, possibly willful, ignorance.

      8 replies →

  • > the word `self` is not special in any way (it's just convention - you can call the first param to a method anything you want).

    The name `self` is a convention, yes, but interestingly in python methods the first parameter is special beyond the standard "bound method" stuff. See for example PEP 367 (New Super) for how `super()` resolution works (TL;DR the super function is a special builtin that generates extra code referencing the first parameter and the lexically defining class)

  • I don't think it's a hot take to say much of Python's design is nuts. It's a very strange language.