Crimes with Python's pattern matching

3 years ago (hillelwayne.com)

Looking in the implementation of the Python version of ABC[0], you'll see that `__subclasshook__` doesn't do any language magic, it just registers the method to be called in `__instancecheck__`. So if you just implement `__instancecheck__` directly, you get the same behavior, but without the caching around it.

  class OneWayMeta(type):
      seen_classes = set()
  
      @classmethod
      def __instancecheck__(cls, instance):
          C = instance.__class__
          print(f"trying {C}")
          if C in cls.seen_classes:
              return False
          cls.seen_classes |= {C}
          return True
  
  
  class OneWay(metaclass=OneWayMeta):
      pass
  
  
  def f(x):
      match x:
          case OneWay():
              print(f"{x} is a new class")
          case _:
              print(f"we've seen {x}'s class before")
  
  
  if __name__ == "__main__":
      f("abc")
      f([1, 2, 3])
      f("efg")

When running:

  trying <class 'str'>
  abc is a new class
  trying <class 'list'>
  [1, 2, 3] is a new class
  trying <class 'str'>
  we've seen efg's class before

Am I missing a particular point the article is making, or did the author overlook this?

[0]: https://github.com/python/cpython/blob/main/Lib/_py_abc.py

  • > if you just implement `__instancecheck__` directly

    The point of the article is that to override what "isinstance(obj, ClassA)" means one doesn't need to touch ClassA, or any descendants of it, at all.

    • Can you help me understand what you mean here? The author manages to make something match `case NotIterable()` by...modifying the `NotIterable(ABC)` class. That's exactly what I would have expected. What you mean by "one doesn't need to touch ClassA, or any descendants of it, at all."?

    • Not quite! The point of the article is that you don’t need to change the class of obj to override this instance check. If you have a look, ClassA always uses the hook implementation in the examples. And this can be shortened to just using instancecheck.

  • I think the point in the first section is that the way ABC instancecheck/subclasshook interacts with pattern matching is surprising for anyone not familiar with ABC. It allows you to check for a match with arbitrary functions, beyond simply checking if an object is an instance of a given type. In the final section where he has issues with caching, I presume he hasn't read about instancecheck, your code would fix his issue.

Reading this I can't help feeling that Python puts the "simplicity" in all the places that don't matter. Simplicity is the reason given for not having useful language features like pattern matches as expressions, or lambdas with multiple expressions, but I've never seen these features[1] cause problems in other languages. And then we have this... Surely semantic simplicity is the simplicity that actually matters, but the way the simplicity argument is used in Python is often to enforce arbitrary syntactic and semantic complexity.

[1] It's almost ridiculous to call these features as they're just a consequence of the underlying language model. E.g. if you have expression blocks you have lambdas with multiple expressions with no extra work.

  • Python has plenty of complexity. It tends to stick it in places where new programmers don't run into it. Python is one of the few languages I know that lets you just... poke at the internal machinery like this.

    • Have you ever messed around w/ Julia? It doesn't really have much "internal machinery" since it relies on a data-oriented/functional language structure. I find myself reading the Julia source code on a regular basis since it's very readable and succinct. I often find it more useful than the documentation itself.

      It can take a bit to fully understand/appreciate Julia's multiple dispatch, but once you do you pretty much understand the entirety of the machinery.

  • These are fair examples, but 'Python puts the "simplicity" in all the places that don't matter' is a bold statement that isn't well supported by these. I work primarily writing hit-and-run R&D code that has a shelf-life of about 6 months - python is either one of, or the best scripting languages for this, precisely because it puts a lot of simplicity in places that DO matter.

    • Python is hands down the best ecosystem for "hacking something together to prove a point but won't be maintained".

      It's (a) extremely easy to learn (takes about 2hrs for a Java/C# dev to be productive) and (b) has a very deep ecosystem and wrappers for pretty much any native library you want. Then (c) it works great under Windows/OSX/Linux as long as you're on an x86/x64 platform. The clincher is (d) it's the de-facto beginners language so all the newbies can at least read it and hack away.

      The competition:

      * PHP is similarly easy but very limited.

      * Ruby is in my experience a slow and buggy mess with a community who are welcoming but suffer from a reality-distortion field (might be different now, my experiences were 15+ years ago).

      * Java has accidental complexity getting started.

      * C# is competitive but for the low-skilled / newbies too hard and still has an irritating NIH syndrome (e.g. pushing people to MS's half-baked crypto APIs instead of first-class ports/wrappers of libsodium / BouncyCastle).

      * Javascript/Typescript are probably the closest, they have better package management for the "hack it together" use-cases but the language itself poorly designed what with all of the unintuitive "surprises".

      My kids are just about old enough to learn coding and I'm going to start them with Python before moving on to C, ASM then if they want to develop anything serious; C# / Java / Rust / TypeScript.

      10 replies →

    • Indeed it is a bold statement, but if one can't make overly grand claims on the Internet then where? :-)

      I'm interested to know to where you find the simplicity in Python. My guess:

      - the ecosystem

      - portions of Python that date back over a decade + perhaps some of the modern string handling and maybe data classes

      My overall point is that the Python community relentlessly beats the drum on simplicity, but modern Python is not a simple language for any reasonable definition. I believe they have increased the complexity of the language while claiming that these complexity-increasing changes are in service of simplicity. I further believe that mountains of this complexity could be avoided with better language design and a better implementation.

      7 replies →

    • But if your code has a shelf life of 6 month, then your code is probably not read and changed as many times, as code, which goes into production settings and might be there for the next couple of years. So actually many things do not matter as much for such throwaway code.

  • Or performance! There's lots of low-hanging fruit in the Python interpreter that doesn't get improved to preserve the purity of the runtime, or whatever. (Well, at least this might see improvements now. But for a long time people would point at it and laugh.)

    • Or to enable the kind of extreme dynamism that is illustrated in TFA. How to optimize code properly when even core relationships like "x is of type T" may be nondeterministic.

  • As someone that follows Python since the 1.6 days and occasionally uses it for scripting, the language is simple only on the surface level, it provides the same magic capabilities of languages like C++, but apparently not many people find their way into the runes tablets.

  > But surely Python clamps down on this chicanery, right?
  > 
  > $ py10 abc.py
  > 10 is not iterable
  > string is iterable
  > [1, 2, 3] is iterable
  > 
  > Oh.
  > 
  > Oh my.

I'm sure I'm being dense and missing the obvious but ... what is the author responding to here? What's wrong or bad?

  • In the context of this article, the result is not surprising, but in general it's probably not most people's expectation that you can define a class, make sure it doesn't subclass any ABCs, but then still have it "match" an ABC. (If you ask me, cases should only match when types are equal -- pattern matching is structural but (in Python) subtyping is anything but.)

    • While I wouldn't go as far as to say that this is "the point" of ABC, it's certainly relatively important, with __subclass_hook__ being promimently placed near the top of the ABC documetnation.

      Control over destructuring isn't entirely new territory for PLs, Scala has Extractor Objects[0], as an example.

      I think that it's a bit easy to say "it should just match the type!" when the reality is that even basic classes like list get overwritten in Python. Ultimately many language features have configurable features through dunder methods, and the fact that those get used by other language features is a feature, not a bug IMO.

      As usual, don't use libraries that do weird stuff... and every once in a while you'll have the nice DSL that does something useful in this space and it will work well.

      The thought experiment about a more restrictive version of this: how does Python tell that an object is a list? If it's through isinstance, then you're hooking into a bunch of tooling that have hooks that can be overwritten. If it's _not_ through isinstance, suddenly you have multiple ways to test if something is a list (which is a problem).

      [0]: https://docs.scala-lang.org/tour/extractor-objects.html

    • Sounds like you just don't know ABCs, and "people who don't know ABCs don't expect ABCs to behave like ABCs" doesn't say much. Let me quote https://docs.python.org/3/glossary.html#term-abstract-base-c... for you:

      > Abstract base classes complement duck-typing by providing a way to define interfaces when other techniques like hasattr() would be clumsy or subtly wrong (for example with magic methods). ABCs introduce virtual subclasses, which are classes that don’t inherit from a class but are still recognized by isinstance() and issubclass().

      You simply don't "subclass ABCs" ever (except when defining an ABC); if you do it's no longer a virtual subclass and you're no longer implementing the ABC. As a concrete example, when did you last "subclass" collections.abc.Iterable? You did not, you implemented __iter__.

    • Python also has structural typing, often called duck typing - if you have a runtime-checkable protocol, an object will also match isinstance even when there is no inheritance.

    • > but in general it's probably not most people's expectation that you can define a class, make sure it doesn't subclass any ABCs, but then still have it "match" an ABC.

      Abstract Base Classes were an attempt to formalize python's duck typing. Matching things that don't inherit from them is their whole purpose.

  • Totally agree. That behaviour is exactly what I would expect.

    All in all, I really don't get the dramatic tone in this article. It turns out that in python (as in most languages that give you access to the internals) if you mess with the internals the results are well messy. But literally nothing in this article suprised me at all.

    • I think they're just taking delight from being able to hijack behaviour from elsewhere in the code, using indirect means.

  • I don't think the author is intending to say there is anything wrong in this particular example; he is, rather, anticipating some ways in which this might obfuscate code, either accidentally or deliberately. The rest of the article investigates some of these possibilities and demonstrates that you can, indeed, do so.

    Perhaps it would have been a bit clearer, and less easy to dismiss as a fuss over nothing, if the author had left the 'not' out of the definition of NotIterable.__subclasshook__(), or defined an IsIterable class with the 'not' in place?

  • Eh, I think they're just overdramatizing things. That's also exactly what I would have expected to happen.

    • The only thing that can sometimes bite you here is that str is iterable, if you expect a list of str and you only get a str and suddenly you iterate over the chars.

      I am not sure if it wouldn't have been better to make the conversion explicit here.

      1 reply →

  • He is just demonstrating that __subclass__ hook has control over what is counted as a match.

    Which he explained in another article that it allows the author of the abstract class to hijack calls to isinstance for any instances created from subclasses.

    • The point though is that the tone of his article seems to suggest that this is some scary "gotcha" of the language, whereas some of us consider this to just be the expected behavior.

      1 reply →

  • To put that in context.. Javascript - the closest competitor - can't even get comparison, numbers or truthiness right....

The real reason why you shouldn't use this is that Python is wrong about typing and classes. In OOP, classes are not types; interfaces are [0]. ABCs are a poor replica of this. Guido, having clearly avoided any relevant literature, states, in PEP-3119 [1]:

> ABCs are not intrinsically incompatible with Interfaces, but there is considerable overlap. For now, I’ll leave it to proponents of Interfaces to explain why Interfaces are better.

Let me try: interfaces are better because the protocol of an object isn't tied to its implementation, but in a properly encapsulated world an interface represents the information available about a class as a type [2]. A subclass may just be reusing an implementation without adhering to the same protocol, or two interchangeable classes might have no inheritance relationship.

Python is in a lot of ways a nice language, and I've certainly enjoyed programming in it, but many points of its design seem intentionally unobservant of prior work and research in programming languages, though perhaps it's equally an indictment of that research that the most popular languages ignore it so much. Typescript handles this much better, though neither it or Java eschew using classes as types entirely.

[0] https://dl.acm.org/doi/10.1145/96709.96721

[1] https://peps.python.org/pep-3119/#abcs-vs-interfaces

[2] https://dl.acm.org/doi/10.1145/2509578.2514738

  • > In OOP, classes are not types; interfaces are [0].

    Depends on which OOP language we are talking about, Smalltalk definitly doesn't have interfaces unless we are talking about later dialects like Pharo, which introduced traits into the language.

    The paper you linked to, makes its point exactly by moving beyond Simula and Smalltalk into their own view of OOP.

    So like anything else on the OOP ecosystem, it is only yet another view about what OOP should be like.

    • Yes, but ABCs and also method resolution are particularly hackish in Python.

      Generally, in Python one always has to understand the implementation and mentally execute the code, because everything is informally specified and nothing is declarative.

  • "Python is wrong about typing and classes. In OOP, classes are not types; interfaces are."

    That might be true if there was one single definition of OOP that everybody agreed on. Like with most other religions, this isn't the case.

  • There's certainly a way to consider classes as types coherently, with added subtyping. In this view, interfaces are collections of types. That is, they correspond most closely to type classes.

  • > Guido, having clearly avoided any relevant literature […] unobservant of prior work

    Same thing can be observed for his blog opinions on parsing. Sadly, this gives rise to a whole generation of programmers who believe (on account of perceiving him as an authority) that Peg are actually good.

    • Agreed. Unintentional ambiguities are a problem of PEG parsers. Ironically, the old Python parser also failed to detect ambiguities.

      There is a strong aversion in the Python space for unambiguous formalisms. A parser that resolves ambiguities by earliest match first seems to satisfy the dynamic mindset.

Hell yeah, another reason to get people to switch to Python >= 3.10. Unfortunately a lot of libraries are still gonna be behind for now: https://pyreadiness.org/3.10/ - of the top 360 most downloaded on PyPI, 212 have explicit Python 3.10 support. This plus the walrus operator, X | Y -style union types, and the speedup possibilities of 3.11 all look great, imo

  • Python has the strongest presence in ML, and I think there are some issues with >3.9 versions with CUDA.

  • 41% of the top packages don't support the new version? And looking back, the numbers aren't even much better for previous versions, 3.8 is several versions old now and they've still got over 20% breakage.

    That is actually pathetic and I'm not blaming the package authors here - Python needs to stop making big breaking changes that rototill the codebase continuously.

    Breaking changes are pretty serious business in the Java world and there is an incredible amount of thought and research put into even something like modules let alone the JDK17 changes where reflection is being fundamentally changed. Python seems to have an absolutely carefree attitude to language breakage, and I guess why wouldn't they? It's always worked for them.

    • > is several versions old now and they've still got over 20% breakage.

      I don't think that's illustrating breakage, just the lack of an explicit declaration that the package supports a newer version of python (which may be newer than the latest release of a given package).

      2 replies →

    • Explicit support, meaning that the package author has added the "3.X" tag in the package metadata.

I didn't see the need to frame everything in the article as being so bad. I thought all those examples were awesome. I'll probably never do anything like them so as not to confuse people but I still think they were all interesting uses of the feature.

Honestly, I think people are too hard on languages (and especially Python) for having new features that challenge the status quo. And then there's also too much drama when it turns out that a scripting language is, in fact, a scripting language! So you can do weird things with stateful ABCs and such. I mean yeah, it's strange. But it probably also has some perfect use case in a very specific circumstance. At the end of the day, if you understand how a feature really works, you can do creative things with it. I'm glad we have it!

  • It made me want to learn python.

    I've been thinking about programming language development, and some weird things from the typing front.

    This could be easily used to achieve unions.

    But another thing that I was thinking about was the return from unsafe land.

    You have some class or object that you want to do some quasi-illegal fucky bullshit to - send it off and do what you will.

    But what comes back might not have the guarantee that it's still the 'shape' of the thing that you sent away.

    This could be used to validate that what comes back is sane.

  • >I thought all those examples were awesome.

    They are great examples. Great examples of what not to do.

    The only use case I can think of is meta programming which most people don't need.

Python is not the first language to have customizable pattern matching. Scala can do the same with extractor objects.

  • Yep. Honestly this seems fine - pattern matching is ultimately just a function call, you can immediately see where it is (and hopefully even click through to the code from where the pattern is defined), there's no "magic" action-at-a-distance.

    • Customizable pattern matching is in fact unusual, although not unheard of. Traditional pattern matching is very much not a function call, because it is compiled with assumption that it can't be customized and no arbitrary user defined code is executed. In particular, usual compilation scheme guarantees that subterms are not matched more than once, but that can't be guaranteed with customizable patterns.

What a delight.

Still like pattern matching in python though. Seems super useful.

  • This hack also seems super useful if used judiciously... perhaps one could write a spicy DSL.

So, you can use duck typing and metaprogramming in Python, even with modern features?

How is that a crime?

Next, you will be surprised that you can use first class functions and late variable binding?

So this is mostly about __subclasshook__. How does this interact with mypy type checking? There must be some tension there too.

edit: I guess I found my answer [1]. That's kind of expected, but it's still ugly.

https://stackoverflow.com/questions/71811960/how-to-use-subc...

  • Even if someone would like Python to pretend to be statically typed, __subclasshook__ is real (and not even type checking in the traditional sense) while type declarations and their checkers are only quasi-executable documentation.

    Personally, I think this is a very good situation: extremely dynamic languages are a worthwhile tool, and the only issue with the pattern matching "exploits" in the article is that boolean operators and non-cached evaluation for subclass checks are not built-in.

> That made me wonder if ABCs could “hijack” a pattern match

What's the hijack there?

I'm really not sure what's the author's point is. He writes the code and get's expected results back.

  • Based on other comments here, it seems like some people don't know the purpose of ABCs and are assuming they're more like a class definition for classes that have to be directly inherited.

    • I've been programming python for years, consider myself extremely knowledgeable in it, and...uhh... yeah TIL I've been using ABCs wrong.

__subclasshook__ is exactly the sort of nonsense that makes Python code a complete mess and impossible to optimise. I'm going to take a wild guess that none of the "fast" Python variants (Cython, Micropython, RPython etc.) support it.

  • I'm going to take a wild guess that none of the "fast" Python variants (Cython, Micropython, RPython etc.) support it.

    Cython at least aims to be a superset of Python, so it will support it sooner or later. However I don't doubt that using it will make your cython code stop being "fast".

Does anybody have an example of when __subclasshook__ is truly useful? Something that can't be accomplish more coherently with a simple function?

Personally, I'd prefer that isinstance() and issubclass() have predictable behavior.

  • A lot of these dunder (double underscore) functions are useful for metaprogramming. Just kinda spitballing an idea but perhaps some code to generate a python object model based on the schema of a database would want to use this method. If your DB schema has some special way of defining subclass relationships (maybe a foreign key to another table) you might need to manually control when something is or isn't a subclass in python's object model based on the result of querying the DB schema.

    In general metaprogramming is the kind of thing you probably don't and shouldn't reach for first, in fact it's usually more for libraries and tools vs. your production business logic. It can get difficult to reason about and pass the maintenance of code that heavily uses metaprogramming to other people unfamiliar with it.

  • The most dominate use (these days, at least) is to implement structural typing (a la Protocol), i.e. conforming to a “shape” without actually inheriting anything. So yeah, it’s not particularly useful for day-to-day use, but still a hook needed to make certain nice things happen behind the scenes.

    • it allows fast unpacking and parameter retrieval align with the usual type checks.

      the other day, this allowed me to refactor a 300L in 50 which are actually readable

  • > Something that can't be accomplish more coherently with a simple function?

    I think an example of `Iterable` (sort of like the one in the article) is a very ham-fisted way of getting this sort of check into Python code.

    In the end, doesn't the difference just boil down to

        class Iterable(ABC):
    
            @classmethod
            def __subclasshook__(cls, C):
                return hasattr(C, "__iter__")
    

    vs.

        def is_iterable(t: Type):
            return hasattr(t, "__iter__")
     
    

    Where the first makes it harder to use `Iterable` incorrectly (i.e. supplying a non-type as parameter).

    I can imagine that a Java or C# programmer would call the first version more "coherent" because it gives the interface `Iterable` a name explicitly.

    Sort of like there's not reaaaaally a reason to use Extension Methods in C# (of course there are, but in a lot of simple scenarios there aren't) as opposed to static methods taking a Type as single parameter.

    • The difference is that if the first one is defined, it can be used like this without cluttering up the function body:

        def foo(bar: Iterable):
          pass

    • > Sort of like there's not reaaaaally a reason to use Extension Methods in C#

      “but it will be so cool!” worked for me :)

Why the heck is

    case Not(DistanceMetric)():

a syntax error?

  • What follows the keyword “case” there is a pattern, not an expression. It’s not an imperative construct of code to be executed, but a declarative construct of an expected shape of an object, which may include name bindings.

    Consider things like “case DistanceMetric(distance=d):” earlier in the article: this checks “is the value an instance of DistanceMetric, and if so, take its distance attribute and bind it to the name d”.

    So in this case, what would it mean? If the value is an instance of Not, take and bind it to the DistanceMetric name (as is typical for a single positional subpattern), and… uh oh, more parentheses, what to do? There’s no obvious sensible meaning for it, so it’s a syntax error.

  • Because the right arm of the 'case' keyword is not actually a statement being executed, but its own syntax element to represent a pattern. It is not expecting two sets of brackets there.

> That made me wonder if ABCs could “hijack” a pattern match. Something like this:

I guess the word "hijack" is used loosely for rhetorical effect here because this seems to be working as intended, and it's not even remotely the most dangerous footgun in Python. The problem (if any) is with `isinstance`, and not the pattern matching. `isinstance` should probably explicitly work with ABCs (via flag or something) because I do agree it's a bit weird that it takes the ABC's `__subclasshook__` as gospel by default.

The same author would say something similar about C/C++ where by obtaining the address of a function and then writing a value to that address you can change the code of the program.

Yes, you are misusing the API. No, there ARE valid use-cases for it: for example a lot of testing/mocking facilities in Python are so easy to implement because of these features. No, the fact that you can't do it in statically typed languages does not imply Python must go the same way.

IDK, this is roughly how CLOS works and it's largely held to be a Good Thing: classes are simply arbitrary sets of predicates a value does or doesn't meet.

It seems like the main problem here is that Python does not enforce that the subclass hook method is pure and so allows you to create buggy implementations. The spec should mandate that the method is pure and causing side effects should be a runtime (or compile time) error.

Strong duck typing - you go to extreme lengths to ensure that the object can indeed quack

the usages he complains like in the second example are wrong.

you are expect to not send a obj but something to have the properties matched.

This is why adding language features needs to be carefully thought out and explored to great lengths. Sadly C++ never got that memo. Here's hoping Python remains "simple."

  • I’m not sure I understand the “this is why” part. Python wants to let you override almost everything. It’s a feature and a powerful and occasionally helpful one. It’s also a feature you never ever have to know about or touch.

    • Agreed - I mostly use Python for small scripts and it makes that use case very easy. I know it has a bunch more features for more niche stuff as well but my throwaway script to download images from a webpage doesn't need pattern matching.

    • It's like python's metaclasses. You rarely need them but sometimes they really are just the best solution to the problem. Those times, you're really glad they're available.

    • > It’s also a feature you never ever have to know about or touch.

      The "never have to know or touch" argument applies only to the lone hacker working on a completely new project with no inherited legacy code.

      1 reply →

  • It’s quite clear from the design of Python, both of recent features and of fundamentals like `+=`, that its designers never got that memo either.

    I’m baffled as to how it retains its reputation of being a simple language.

  • I like Python, but it really isn’t “simple” (at least not for more than a decade).

  • Is it better for Python to be simple than for Python to be simple for simple problems?

  • All languages got the memo, including C++.

    Either you get Scheme, or languages with features.

    Even C isnt' as "simple" as people take it to be.