← Back to context

Comment by jmduke

1 year ago

Having migrated my application's Python and JS codebases to their typed siblings respectively last year, my 2c is that Python typing feels good and worthwhile when you're in the standard lib, but _awful_ (and net-negative) once you leave "normal Python" for the shores of third-party packages, particularly ones that lean heavily on duck typing (Django and BeautifulSoup both come to mind.)

This is where some of the stuff in the TypeScript ecosystem really shines, IMHO — being able to have a completely typesafe ORM such as Drizzle (https://orm.drizzle.team/) feels like a Rubicon moment, and touching anything else feels like a significant step backwards.

My experience has been different: last year I started writing Python again after a long break, and I am yet to regret using types pervasively. If some library has no type definitions, I prefer to have my typed code interact with its untyped code. It is still better than having no types at all. You can sometimes get some useful type safety by annotating your functions with the untyped library's classes.

Since then, I have used established libraries like Beautiful Soup, Jinja, Pillow, platformdirs, psutil, python-dateutil, redis-py, and xmltodict with either official or third-party types. I remember their types being useful to varying degrees and not a problem. I have replaced Requests with the very similar but typed and optionally async HTTPX. My most objectionable experience with types in Python so far has been having to write

    root = cast(
        lxml.etree._Element,  # noqa: SLF001
        html5.parse(html, return_root=True),
    )

when I used types-lxml with https://github.com/kovidgoyal/html5-parser. In return I have been able to catch some bugs early and to "fearlessly"refactor code with few or no unit tests, only integration tests. The style I have arrived at is close to https://kobzol.github.io/rust/python/2023/05/20/writing-pyth....

Admittedly, I don't use Django. Maybe I won't like typed Django if I do. My choice of type checker is Pyright in non-strict mode. It seems to usually, though not always, catch more and more subtle type errors than mypy. I understand that for Django, mypy with a Django plugin is preferred.

  • You can also use something like stubgen to generate function definition signatures for dependencies for mypy to validate, then make your own changes to those files with better types if you wish.

    I don’t think it’s very scalable, and having the library itself or a stubs package come with types is the only “good”-feeling route, but you at least have a somewhat decent path to still getting it decent without any intervention on the library’s part. It may even be sufficient, if (like in most situations) you only use a few functions from a library (which may in turn call others, but you only care about the ones your code directly touches), and therefore only need to type those ones.

I agree. Prior to the introduction of types in Python, I thought I wanted it. Now I hate them. It feels like a bunch of rigmarole for no benefit. I don’t use an IDE, so code completion or whatever you get for it doesn’t apply to me. Even strongly typed languages like rust have ergonomics to help you avoid explicitly specifying types like let x = 1. You see extraneous code like x: int = 1 in Python now. Third party libs have bonkers types. This function signature is ridiculous:

    sqlalchemy.orm.relationship(argument: _RelationshipArgumentType[Any] | None = None, secondary: _RelationshipSecondaryArgument | None = None, *, uselist: bool | None = None, collection_class: Type[Collection[Any]] | Callable[[], Collection[Any]] | None = None, primaryjoin: _RelationshipJoinConditionArgument | None = None, secondaryjoin: _RelationshipJoinConditionArgument | None = None, back_populates: str | None = None, order_by: _ORMOrderByArgument = False, backref: ORMBackrefArgument | None = None, overlaps: str | None = None, post_update: bool = False, cascade: str = 'save-update, merge', viewonly: bool = False, init: _NoArg | bool = _NoArg.NO_ARG, repr: _NoArg | bool = _NoArg.NO_ARG, default: _NoArg | _T = _NoArg.NO_ARG, default_factory: _NoArg | Callable[[], _T] = _NoArg.NO_ARG, compare: _NoArg | bool = _NoArg.NO_ARG, kw_only: _NoArg | bool = _NoArg.NO_ARG, lazy: _LazyLoadArgumentType = 'select', passive_deletes: Literal['all'] | bool = False, passive_updates: bool = True, active_history: bool = False, enable_typechecks: bool = True, foreign_keys: _ORMColCollectionArgument | None = None, remote_side: _ORMColCollectionArgument | None = None, join_depth: int | None = None, comparator_factory: Type[RelationshipProperty.Comparator[Any]] | None = None, single_parent: bool = False, innerjoin: bool = False, distinct_target_key: bool | None = None, load_on_pending: bool = False, query_class: Type[Query[Any]] | None = None, info: _InfoType | None = None, omit_join: Literal[None, False] = None, sync_backref: bool | None = None, **kw: Any) → Relationship[Any]

https://docs.sqlalchemy.org/en/20/orm/relationship_api.html#...

  • > It feels like a bunch of rigmarole for no benefit. I don’t use an IDE, so code completion or whatever you get for it doesn’t apply to me.

    Maybe try using an IDE? Without one any language's type system will feel more frustrating than it's worth, since you won't get inline error messages either.

    > Even strongly typed languages like rust have ergonomics to help you avoid explicitly specifying types like let x = 1.

    This is called type inference, and as far as I can tell this level of basic type inference is supported by the major python type checkers. If you're seeing people explicitly annotate types on local variables that's a cultural problem with people who are unaccustomed to using types.

    As for that function signature, it would be bonkers with or without types. The types themselves look pretty straightforward, the problem is just that they formatted it all on one line and have a ridiculous number of keyword arguments.

    • Of course I’ve used an IDE before. I still prefer Vim to an IDE. And I enjoy writing typed languages in Vim because the compiler catches mistakes.

      I agree part of the problem is cultural. Maybe a bunch of Python coders are eager to use types, or maybe linters are pushing them to type every last variable because that is “right.” I don’t know.

      I don’t hate typed languages at all. In fact I love writing Rust. Even C++ is tolerable from a type perspective. I don’t agree that _RelationshipJoinConditionArgument is a meaningful type. It feels like bolting a type system onto the language after the fact is weird and necessitates crazy types like that to make some linter happy, maybe to make VS Code users happy, at the expense of readability.

      8 replies →

    • > Without one any language's type system will feel more frustrating than it's worth, since you won't get inline error messages either.

      I disagree, for me the integration with the editor mostly shortens feedback cycles, and enables some more advanced features. The utility of identifying problems without running the code is still there.

  • > I don’t use an IDE, so code completion or whatever you get for it doesn’t apply to me.

    This is a reasonable take if you're a solo developer working without an IDE. Though I suspect you'd still find a few missing None checks with type checking.

    If you're working on a team, though, the idea is to put type-checking into your build server, alongside your tests, linting, and whatnot.

    > You see extraneous code like x: int = 1 in Python now.

    This shouldn't be necessary in most cases; Python type checkers are fine with inferring types.

    > Third party libs have bonkers types. This function signature is ridiculous:

    It is. Part of that is that core infrastructure libraries tend to have wonky signatures just by their nature. A bigger part, though, is that a lot of APIs in popular Python libraries are poorly designed, in that they're extremely permissive (like pandas APIs allowing dataframes, ndarrays, list of dicts, and whatever else) and use kwargs inappropriately. Type declarations just bring that to the surface.

    • What's wrong with being extremely permissive? I'd argue that's a strength of the python ecosystem. It's true that very dense api:s are difficult to type, but I wouldn't say they're typically poorly designed because of it.

      2 replies →

    • Even without an IDE, I use Mypy like a test suite. It catches real bugs that would be either hard to find in testing, or intrusive and annoying to test for.

  • That signature is ridiculous in any programming language. Types aren't the problem here.

    • Was about to say the same thing... that method takes like 20 arguments. Types are the only thing making it usable.

  • I wouldn't mind all of that if the SQLAlchemy documentation would hide all the types until I mouse over them.

    Ditto for vim!

  • I've literally never seen anyone put types on trivial variables like that. Maybe your team is just inexperienced with types and/or python?

> being able to have a completely typesafe ORM such as Drizzle (https://orm.drizzle.team/) feels like a Rubicon moment, and touching anything else feels like a significant step backwards.

Alright, but there's nothing stopping you from having a completely typesafe ORM in python, is there?

Sure, there's isn't really one that everyone uses yet, but the python community tends to be a bit more cautious and slower to adopt big changes like that.

  • I'm talking about practical limitations, not academic ones. You're not incorrect (and libraries like FastAPI and Pydantic make me confident that the benefits of type-safety will grow throughout the ecosystem) but I am talking about from the perspective of someone considering whether or not to adopt typing within their Python project today.

    • What harm do you think typing a function, for example, would do? I'm genuinely qurious because I just can't see where the issue is.

If I remember correctly, Typescript felt the same way for quite a long time

  • It did, especially in the late 2013 and early 2014s. But then the type repositories quickly caught up. Python package authors usually shy away from such endeavours, especially those who use kwargs in order to configure large classes. pygann comes to mind.

    • Python has also seen a sizable movement towards using types as part of the design, such as Typer, FastAPI and Pydantic. Existing mainstream libraries are also slowly adopting types, such as pandas and numpy (including `numpy.typing`).

      For the latter cases, it's not easy because typed APIs require different principles than dynamic/duck-typed ones. Still, I think it's safe to say that the community is trending towards more typing over time, especially greenfield projects. Personally, all my new projects are 100% typed, with type-safe wrappers around untyped libraries.

      For what it's worth, since Python 3.12 (or with typing_extensions for earlier versions), it's also possible to use Unpack and TypedDict to type kwargs.

    • Python was always much more dynamically typed than JS, and it also became the prevailing approach in the ecosystem.

Using types properly is always annoying. Seeing MyPy report no problems makes it worth it.

I find myself doing a lot of isinstance() and raise TypeError, but that's still a huge win, protecting everything after I've asserted the duck type is what it should be.

I also use beartype for runtime protection.

Typescript is pretty amazing though. I really like how integrated the ecosystem is.

Maybe we can use LLMs to automatically bring these third party libs up to par?

Could be a nice showcase project for Copilot.

  • > Maybe we can use LLMs to automatically bring these third party libs up to par?

    So, I actually tried this. I tried to use copilot to help generate type stubs for a third party library, hoping to be pleasantly surprised.

    Copilot generated reasonable-looking type stubs that were not close enough to correct to be of any value. Even with the full source code in context, it failed to "reason" correctly about any of the hard stuff (unions, generics, overloads, variadics, quasi-structured mappings, weird internal proxy types, state-dependent responses, etc. etc.).

    In my experience, bolting types onto a duck-typed API always produces somewhat kludgy results that won't be as nice as a system designed around static typing. So _of course_ an LLM can't solve that problem any more than adding type stubs can.

    But really, the answer to "will LLMs fix $hard_problem for us?" is almost always "no", because $hard_problem can rarely be solved by just writing some code.