Comparing Python Type Checkers: Typing Spec Conformance

7 hours ago (pyrefly.org)

Wow, quite surprising results. I have been working on a personal project with the astral stack (uv, ruff, ty) that's using extremely strict lint/type checking settings, you could call it an experiment in setting up a python codebase to work well with AI. I was not aware that ty's gaps were significant. I just tried with zuban + pyright. Both catch a half dozen issues that ty is ignoring. Zuban has one FP and one FN, pyright is 100% correct.

Looks like I will be converting to pyright. No disrespect to the astral team, I think they have been pretty careful to note that ty is still in early days. I'm sure I will return to it at some point - uv and ruff are excellent.

Using VSCodium I was having issues with python type checkers for quite a while. I did the basedpyright thing for a while but that was painful. It's a bit too based for me, and I'm not sure i'd call it based. Right now I have uv, ruff, and ty and I'm happy with it. It's super easy to update and super fast. I didn't realize the coverage wasn't as good as some others but I still like it. I may have to try pyrefly. Never heard of it until this post, so thank you.

I've been using ty on some previously untyped codebases at work. It does a good job of being fast and easy to use while catching many issues without being overly draconian.

My teammates who were writing untyped Python previously don't seem to mind it. It's a good addition to the ecosystem!

  • And it makes it infinitely easier for them to get with the times and start typing their code!

  • I am worried about the false negatives/positive rate however. Hope it improves.

    • My understand is Astral's focus for ty has been on making a good experience for common issues, whereas they plan for very high compliance but difficult or rare edge cases aren't are prioritized.

      Compliance suite numbers are biased towards edge cases and not the common path because that's where a lot of the tests need to be added.

      My advise is to see how each type checker runs against your own codebase and if the output/performance is something you are happy with.

      1 reply →

Are there any good static (i.e. not runtime) type checkers for arrays and tensors? E.g. "16x64x256 fp16" in numpy, pytorch, jax, cupy, or whatever framework. Would be pretty useful for ML work.

I still can't get over the utter idiocy in Python's type hints being decorative. In what world does x: int = "thing" not give someone in the standardisation process pause?

  • Can you elaborate what you mean by decorative?

    If you run a type checker like ty or pyright they're not decorative — you'll get clear diagnostics for that particular example [1], and any other type errors you might have. You can set up CI so that e.g. blocks PRs from being merged, just like any other test failure.

    If you mean types not being checked at runtime, the consensus is that most users don't want to pay the cost of the checks every time the program is run. It's more cost-effective to do those checks at development/test/CI time using a type checker, as described above. But if you _do_ want that, you can opt in to that using something like beartype [2].

    [1] https://play.ty.dev/905db656-e271-4a3a-b27d-18a4dd45f5da

    [2] https://github.com/beartype/beartype/

  • It's a community that delayed progress for a decade while they waited for everyone to put parenthesis on the print statement. Give 'em enough time and they'll figure out best practices.

  • In C-ish languages the statement

        int x = "thing"
    

    is perfectly valid. It means reserve a spot for a 32 bit int and then shove the pointer to the string "thing" at the address of x. It will do the wrong thing and also overflow memory but you could generate code for it. The type checker is what stops you. It's the same in Python, if you make type checking a build breaker then the annotations mean something. Types aren't checked at runtime but C doesn't check them either.

    • In C, int may be as small as 16 bits You may get 32 bits (or more) but it's not guaranteed. I don't see how you get a memory overflow though?

      I'd be surprised if a compiler with -Wall -Werror accepts to compile this.

      Trying to cast back the int to a char* might work if the pointers are the same size as int on the target platform, but it's actually Undefined Behaviour IIRC.

  • It's the complete opposite. The objective of type hints is that they're optional precisely because type hints narrow the functionality of the language. And evidenced by the fact that different type checks have different heuristics for determining what is a valid typed program and what isn't, it seems that the decision is correct.

    No type system will allow for the dynamism that Python supports. It's not a question of how you annotate types, it's about how you resolve types.

    • Optional on paper, sure. Once you publish shared libs or keep a nontrivial repo usable across teams, type hints stop feeling optional fast, because the minute mypy, pyright, and Pyre disagree on metaprogramming or runtime patching you get three incompatible stories about the same program and a pile of contraditions instead of signal. Python can stay dynamic, yet this setup mostly buys busywork for CI and false confidence for humans.