← Back to context

Comment by mattclarkdotnet

5 days ago

What's nuts is that the language doesn't guarantee that successive references to the same member value within the same function body are stable. You can look it up once, go off and do something else, and look it up again and it's changed. It's dynamism taken to an unnecessary extreme. Nobody in the real world expects this behaviour. Making it just a bit less dynamic wouldn't change the fundamentals of the language but it would make it a lot more tractable.

> What's nuts is that the language doesn't guarantee that successive references to the same member value within the same function body are stable. You can look it up once, go off and do something else, and look it up again and it's changed.

There is no such thing as 'successive references to the same member value' here. It's not that you look up the same object and it can change, it's that you are not referring to the same object at all.

self.x is actually self.__getattr__('x'), which can in fact return a different thing each time. `self.x` IS a string lookup and that is not an implementation detail, but a major design goal. This is the dynamism, that is one of the selling points of Python, it allows you to change and modify interfaces to reflect state. It's nice for some things and it is what makes Python Python. If you don't want that, use another language.

  • ok, then it is nuts that __getattr__ (itself a specially blessed function) is not required to be pure at least from the caller point of view.

    • If it was it wouldn't be Python. It can never be pure because __getattr__ is just another method that anyone can overwrite.

In Python attribute access aren't stable! `self.x` where `x` is a property is not guaranteed to refer to the same thing.

And getting rid of descriptors would be a _fundamental change to the language_. An immeense one. Loads of features are built off of descriptors or descriptor-like things.

And what you're complaining about is also not true in Javascript world either... I believe you can build descriptor-like things in JS now as well.

_But_ if you want that you can use stuff like mypyc + annotations to get that for you. There are tools that let you get to where you want. Just not out of the box because Python isn't that language.

Remember, this is a scripting language, not a compiled language. Every optimization for things you talk about would be paid on program load (you have pyc stuff but still..)

Gotta show up with proof that what you're saying is verifiable and works well. Up until ~6 or 7 years ago CPython had a concept of being easy to onboard onto. Dataflow analyses make the codebase harder to deal with.

Having said all of that.... would be nice to just inline RPython-y code and have it all work nicely. I don't need it on everything and proving safety is probably non-trivial but I feel like we've got to be closer to doing this than in the past.

I ... think in theory the JIT can solve for that too. In theory

  • >Remember, this is a scripting language, not a compiled language

    This is the fundamental issue and "elephant in the room" that everyone is seems to be overlooking, and putting under the carpet.

    The extreme compiled type language guys going gung-ho with very slow to compile and complicated Rust (moreso than C++), while the rest of the world gladly hacking their shiny ML/AI codes in scripting language aka Python "the glue duct tapes language" with most if not all the fast engine libraries (e.g PyTorch) written in unsafe C/C++.

    The problem is that Python was meant for scripting not properly designed software system engineering. After all it's based on ABC language for beginners with an asterisk attached "intended for teaching or prototyping, but not as a systems-programming language" [1].

    In ten years time people will most probably look in horror at their python software stacks tech debt that they have to maintain for the business continuity. Or for their own sanity, they will rewrite the entire things in much more stable with fast development and compiled modern language eco-system like D language with native engine libraries, and seamless integration C, and C++ (to some extend) if necessary.

    [1] ABC (programming language)

    https://en.wikipedia.org/wiki/ABC_(programming_language)

    • > In ten years time people will most probably look in horror at their python software stacks tech debt that they have to maintain for the business continuity.

      I regret to inform you that there are _loads_ of multi-decades-old Python stacks at this point.

      On the micro level I'll be like "ugh wish I wasn't paying the costs of Python" decently enough. But on the macro level I don't regret Python stacks. At least not when looking at the alternatives.

      Tho I will admit I'm a bit mystified at data science stuff in particular persisting in Python. Lots of CPU churn even if the underlying libs are all C extensions.

    • > The problem is that Python was meant for scripting not properly designed software system engineering.

      What something was meant to do has never, ever stopped people. People find creative ways to use tools in unintended ways all the time. It's what we do.

      We can call this dumb or get misanthropic about it, or we can try to understand why people all over the world choose to use Python in "weird" ways, and what this tells us about the way people relate to computing.

    • > In ten years time people will most probably look in horror at their python software stacks tech debt that they have to maintain for the business continuity

      Yes, like they did for JavaScript!

> What's nuts is that the language doesn't guarantee that successive references to the same member value within the same function body are stable.

The language supports multiple threads and doesn’t have private fields (https://docs.python.org/3/tutorial/classes.html#private-vari...), so the runtime cannot rule out that the value gets changed in-between.

And yes, it often is obvious to humans that’s not intended to happen, and almost never what happens, but proving that is often hard or even impossible.

  • wouldn't a concurrent change without synchronization be UB anyway? Also parent wants to cache the address, not the value (but you have to cache the value if you want to optimize manually)

    • Why would it be UB? All objects are behind (thin) pointers, which can be overwritten atomically.

    • Not necessarily UB, but absolutely "spooky action" nondeterministic race conditions that make things difficult to understand.

> Nobody in the real world expects this behaviour.

For example, numbers and strings are immutable objects in Python. If self.x is a number and its numeric value is changed by a method call, self.x will be a different object after that. I'd dare say people expect this to work.

basically all object oriented languages work like that. You access a member; you call a method which changes that member; you expect that change is visible lower in the code, and there're no statically computable guarantees that particular member is not touched in the called method (which is potentially shadowed in a subclass). It's not dynamism, even c++ works the same, it's an inherent tax on OOP. All you can do is try to minimize cost of that additional dereference. I'm not even touching threads here.

now, functional languages don't have this problem at all.

  • OOP has nothing to do with it. In your C++ example, foo(bar const&); is basically the same as bar.foo();. At the end of the day, whether passing it in as an argument or accessing this via the method call syntax it's just a pointer to a struct. Not to mention, a C++ compiler can, and often does, choose to put even references to member variables in registers and access them that way within the method call.

    This is a Python specific problem caused by everything being boxed by default and the interpreter does not even know what's in the box until it dereferences it, which is a problem that extends to the "self" object. In contrast in C++ the compiler knows everything there's to know about the type of this which avoids the issue.

    • That's not true. I mean: it's true that it has little to do with OOP, but most imperative languages (only exception I know is Rust) have the issue, it's not "Python specific". For example (https://godbolt.org/z/aobz9q7Y9):

      struct S { const int x; int f() const; }; int S::f() const { int a = x; printf("hello\n"); int b = x; return a-b; }

      The compiler can't reuse 'x' unless it's able to prove that it definitely couldn't have changed during the `printf()` call - and it's unable to prove it. The member is loaded twice. C++ compilers can usually only prove it for trivial code with completely inlined functions that doesn't mutate any external state, or mutates in a definitely-not-aliasing way (strict aliasing). (and the `const` don't do any difference here at all)

      In Python the difference is that it can basically never prove it at all.

    • > This is a Python specific problem caused by everything being boxed

      I would say it is part python being highly dynamic and part C++ being full of undefined behavior.

      A c++ compiler will only optimize member access if it can prove that the member isn't overwritten in the same thread. Compatible pointers, opaque method calls, ... the list of reasons why that optimization can fail is near endless, C even added the restrict keyword because just having write access to two pointers of compatible types can force the compiler to reload values constantly. In python anything is a function call to some unknown code and any function could get access to any variable on the stack (manipulating python stack frames is fun).

      Then there is the fun thing the C++ compiler gets up to with varibles that are modified by different threads, while(!done) turning into while(true) because you didn't tell the compiler that done needs to be threadsafe is always fun.

      1 reply →

    • > This is a Python specific problem caused by everything being boxed by default and the interpreter does not even know what's in the box until it dereferences it

      That's not the whole thing, what is going on. Every attribute access is a function call to __getattr__, that can return whatever object it wants.

      bar.foo (...) is actually bar.__getattr__ ('foo') (bar, ...)

      This dynamism is what makes Python Python and it allows you to wrap domain state in interface structure.

> same member value within the same function body are stable

Did you miss the part where I explained to you there's no way to identify that it's a member variable?

> Nobody in the real world expects this behaviour

As has already been explained to you by a sibling comment you are in fact wrong and there are in fact plenty of people in the real world who do actually expect this behavior.

So I'll repeat myself: lots of hottakes from just pure. Unadulterated, possibly willful, ignorance.

  • The above is a very thick response that doesn't address the parent's points, just sweeps them under the rag with "that's just how it was designed/it works".

    "Did you miss the part where I explained to you there's no way to identify that it's a member variable?"

    No, you you did miss the case where that in itself can be considered nuts - or at least an unfortunate early decision.

    "this just how things are dunn around diz here parts" is not an argument.

    • > No, you you did miss the case where that in itself can be considered nuts - or at least an unfortunate early decision.

      This is not a side implementation detail, that they got wrong, this is a fundamental design goal of Python. You can find that nuts, but then just don't use Python, because that is (one of) that things, that make Python Python.

    • > considered nuts - or at least an unfortunate early decision

      Please explain to us then how exactly you would infer a variable with an arbitrary name is actually a reference to the class instance in an interpreted language.

      5 replies →