Python Type Checker Comparison: Empty Container Inference

4 days ago (pyrefly.org)

My favorite part about the type annotations in python is that it steers you into a sane subset of the language. I feel like it's kind of telling that python is this super dynamic language but the type annotations aren't powerful enough to denote all that craziness.

  • That's nice if you're starting from scratch, but if you have existing code to deal with, you don't have the privilege of ignoring the insane subset.

A more complicated version of this problem exists in TypeScript and Ruby, where there are only arrays. Python’s case is considerably simpler by also having tuples, whose length is fixed at the time of assignment.

In Python, `x = []` should always have a `list[…]` type inferred. In TypeScript and Ruby, the inferred type needs to account for the fact that `x` is valid to pass to a function which takes the empty tuple (empty array literal type) as well as a function that takes an array. So the Python strategy #1 in the article of defaulting to `list[Any]` does not work because it rejects passing `[]` to a function declared as taking `[]`.

Is there a compile-to-Python language with built-in type safety, similar to how TypeScript transpiles to JavaScript? I'm aware of Mojo and mypyc, but those compile to native code/binaries, not Python source.

  • Python does not need that, as it has built-in type annotation support. The annotation is any expression, so you can in theory express anything a custom type-only language would allow you (although you could make it less verbose and easier to read).

    However, the it IMHO just works much worse than TS because: * many libraries still lack decent annotations * other libraries are impossible to type because of too much dynamic stuff * Python semantics are multiple orders of magnitude more complex than JavaScript. Even just the simplest question: Is `1` allowed in parameter typed `float`? What about numpy float64?

    • Thanks for helping me understand. I wasn't aware of Python's type annotation support. I did some quick research and learned that type annotations don't cause compile errors even when there are type errors. Is that why type checkers like Pyrefly exist?

In the example given in the article i think the correct behavior would have been to infer the type backwards from the return type of the function. Is that not why mypy actually errors here?

FWIW, Typescript is using Strategy 2: https://www.typescriptlang.org/play/?#code/GYVwdgxgLglg9mABM...

I'm a bit confused by the fact that the array starts out typed as `any[]` (e.g. if you hover over the declaration) but then, later on, the type gets refined to `(string | number)[]`. IMO it would be nicer if the declaration already showed the inferred type on hover.

  • I agree, it's always been unsettling to see any[] on hover, even though it gets typed in the end.

    I think one reason might be to allow the type to be refined differently in different code paths. For example:

        function x () {
            let arr = []
            if (Math.random() < 0.5) {
                arr.push(0)
                return arr
            } else {
                arr.push('0')
                return arr
            }
        }
    

    In each branch, arr is typed as number[] and string[], respectively, and x's return type is number[] | string[]. If it decided to retroactively infer the type of arr at declaration, then I'd imagine x's return type would be the less specific (number | string)[].

  • It depends on your tsconfig. An empty array could be typed as never[], forcing you to annotate it.

    • I don't believe this is correct. There's no settings that correspond to that AFAIK, and it'd actually be quite bad, because you could access the empty array and then get a `never` object, which you're not supposed to be able to do.

      https://www.typescriptlang.org/play/?#code/GYVwdgxgLglg9mABM...

      `unknown[]` might be more appropriate as a default, but TypeScript does you one better: with OP's settings, although it's typed as `any[]`, it'll error out if you don't do anything to give it more information because of `noImplicitAny`.

In early typescript I was too lazy and just set an inital value and then zero the list

my wishlist for pyrefly: when using decorated functions, show the underlying type hints instead of the decorators

I think it would be worth mentioning that in normal use (strict mode) Pyright simply requires you to add type annotations to the declaration. Occasionally mildly annoying but IMO it's clearly the best option.

  • Requiring the annotations on empty containers is the only way to have type safety if the type checker cannot infer the type of the container, like Pyright.

    If the type checker can infer a type then the annotation would only be required if the inferred type doesn't match the user's intent, which means one would need to add fewer annotations to an arbitrary working-but-unannotated program to satisfy the type checker.

  • It's not "mildly annoying".

    I don't enable strict mode on multiple projects because people don't want to type anything outside of function signatures.

    Inferring the type from the first use is 100% the correct choice because this is what users want 99% of the time, for the rest you can provide type information.

I can't help but find type hints in python to be..goofy? I have a colleague who has a substantial C++ background and now working in python, the code is just littered with TypeAlias, Generic, cast, long Unions etc.. this can't be the way..

  • Typing is a relatively easy way for the human author and the machine to notice if they disagree about what's going on before problems arise. It is unfortunate that Python doesn't do a good job with types, I was reading earlier today about the mess they made of booleans - their bool type is actually just the integers again.

  • I strongly disagree. Python has actually done a decent job of adding type annotations into the language IMO.

    If you ignore the bit where they don't actually specify their semantics anyway.

    > this can't be the way..

    The alternative is fragile and unmaintainable code. I know which I prefer!

Enforcing explicit annotations in strict mode is a productivity multiplier. It prevents `list[Unknown]` from polluting the rest of the codebase, which is much harder to fix later.