← Back to context

Comment by kuratkull

3 months ago

> my_list = [1, 2, 3]

> pyrefly, mypy, and pyright all assume that my_list.append("foo") is a typing error, even though it is technically allowed (Python collections can have multiple types of objects!)

> If this is the intended behavior, ty is the only checker that implicitly allows this without requiring additional explicit typing on my_list.

EDIT: I didn't intend my comment to be this sharp, I am actually rooting for ty to succeed :)

ORIGINAL: I am strongly against ty behaviour here. In production code you almost always have single type lists and it is critical that the typechecker assumes this, especially if the list already has same-type _literal_ items.

The fact that Python allows this has no bearing at all. To me having list[int | str] implicitly allowed by the typechecker seems like optimizing for beginner-level code.

> I am strongly against ty behaviour here.

[ty developer here]

Please note that ty is not complete!

In this particular example, we are tripped up because ty does not do anything clever to infer the type of a list literal. We just infer `list[Unknown]` as a placeholder, regardless of what elements are present. `Unknown` is a gradual type (just like `Any`), and so the `append` call succeeds because every type is assignable to `Unknown`.

We do have plans for inferring a more precise type of the list. It will be more complex than you might anticipate, since it will require "bidirectional" typing to take into account what you're doing with the list in the surrounding context. We have a tracking issue for that here: https://github.com/astral-sh/ty/issues/168

  • I hope I didn't come off as angry or anything, I was just very surprised by the behaviour :)

    I am talking from some experience as I had to convert circa 40k lines of untyped code (dicts passed around etc) to fully typed. IIRC this behaviour would have masked a lot of bugs in my situation. (I relied on mypy at first, but migrated to pyright about 1/4 in).

    But otherwise it's good to hear that this is still in progress and I wish the project the best of luck.

    • > I hope I didn't come off as angry or anything, I was just very surprised by the behaviour

      Not at all! :-) Just wanted to clarify for anyone else reading along

  • So, how does that relate to this quote from the article?

      >ty, on the other hand, follows a different mantra: the gradual guarantee. The principal idea is that in a well-typed program, removing a type annotation should not cause a type error. In other words: you shouldn’t need to add new types to working code to resolve type errors.
    

    It seems like `ty`'s current behaviour is compatible with this, but changing it won't (unless it will just be impossible to type a list of different types).

    • You could have a `list[int | str]` but then you need to check the type of the elements in the list on usage to see if they are `int` or `str` (if you are actually trying to put the elements into a place that requires an `int` or requires a `str` but wouldn't accept an `int | str`...).

      If your code doesn't do that then your program isn't well typed according to Python's typing semantics... I think.

      So you can have lists of multiple types, but then you get consequences from that in needing type guards.

      Of course you still have stuff like `tuple[int, int, int, str]` to get more of the way there. Maybe one day we'll get `FixedList[int, int, int, str]`....

    • There are ways to type invariant generics more precisely that still meet the gradual guarantee. E.g.:

        x = []  # list[Unknown]
        x.append(A())  # list[Unknown | A]
        takes_list_of_a_or_b(x)  # list[A | B]
      

      We haven't decided yet if this is what we want to do, though. It's also possible that we may decide to compromise on the gradual guarantee in this area. It's not an ironclad rule for us, just something we're considering as a factor.

  • Have you all looked at how Pyrefly does it, or are your methods incompatible?

    • Well ours is not yet implemented, so it's too early to say whether they're compatible. :-)

      But less snarkily, we do talk to them often (and the authors of other tools like mypy and pyright) to make sure we aren't introducing gross incompatibilities between the different type checkers. When there are inconsistencies, we want to make sure they are mindful rather than accidental; for good reasons; spec-compliant; and well documented.

I don't think it's optimizing for beginner-level code, I think it's optimizing for legacy code. Introducing a type checker to a large existing untyped codebase is a big lift, but becomes less of one if almost all existing code is accepted.

  • Well then support an option to enable that kind behaviour? Make it an explicit decision by the devs. I think running in a type error and then adding an exception to your config is safer than silently pass and only learn about the mixed types in a production bug

    • I think this should be handled by a type assisted linter not typechecker.

      Imo a type checker in a dynamic language should is primarily there to avoid runtime errors. In a list with multiple types the typechecker should instead force you to check the type before using an element in that list.

      If you want static types python is the wrong language

list[int | str] might usually be a mistake, but what about

my_list = [BarWidget(...), FooWidget(...)] ?

my_list.append(BazWidget(...))

my_list.append(7)

Wouldn't it be nice if the type checker could infer the type hint there, which is almost certainly intended to be list[Widget], and allow the first append and flag the second one?

The problem with the pyrefly behavior is that if you have a large codebase that isn't using any sort of Python typechecking, you can't just adopt this tool incrementally. You have to go fix up all of these issues. So you need to get widespread support for this migration.

For an internal tool at Meta, this is fine. Just make all your engineers adopt the style guide.

For introducing a tool gradually at an organization where this sort of change isn't one of the top priorities of engineering leadership, being more accepting is great. So I prefer the way ty does this, even though in my own personal code I would like my tool to warn me if I mix types like this.

>The fact that Python allows this has no bearing at all. To me having list[int | str] implicitly allowed by the typechecker seems like optimizing for beginner-level code.

Yes, lets base our tooling on your opinion rather what is allowed in python.

I am strongly for ty's behaviour here. working python code should not raise type errors unless the user explicitly opts in to a more static subset of the language by adding type annotations.

> and it is critical that the typechecker assumes this

Why is it critical though? If having a `list[int]` was a requirement I would expect a type error where that's explicit.

  • Because to me this seems like a fantastic example of a highly possible mistake that a typechecker _should_ catch. Without defined types in this situation a couple of things could happen: 1) it gets printed or passed to some other Any method and the typechecker never yells at you and it crashes in production 2) the typechecker catches the error somewhere long down the line and you have to backtrack to find where you might be appending a str to a list[int].

    Instead it could mark it as an error (as all the other checkers do), and if that's what the user really intended they can declare the type as list[str | int] and everything down the line is checked correctly.

    So in short, this seems like a great place to start pushing the user towards actually (gradually) typing their code, not just pushing likely bugs under the rug.

It depends on what happens with the list after that. Are there int specific operations applied or it is just printed? What if it is fed into objects with a str attribute where the ints could be cast to str?

I don't know. I would argue that since type checking in python is optional, the type checkers shouldn't care unless the programmer cares. A more interesting case would be my_list.append(2.45) or my_list.append(Decimal("2.0")). Those cases would be "numbers" not just "ints".

In the real world, a row of CSV data is not type checked -- and the world hasn't pushed the spreadsheet industry to adopt typed CSV data.