Comment by alfalfasprout
12 hours ago
> New typecheckers don't need to be perfect. They need to be good enough, easy to integrate and have low false positives. Sure, they will improve with time, but if feels like a pain then no one will pick it up. Python users are famously averse to tools that slow down their dev cycles, even if it means better long term stability
I really don't agree. Sure, they don't need to be perfect but also keep in mind many codebases have already standardized on something like (based)pyright or mypy. So there's a migration cost. If your analysis has a lot of false positives or misses a lot of what those type checkers miss then there's little incentive to migrate. Sure, ty and pyrefly are much faster, but at the end of the day speed is only one consideration for a type checker.
Think of it this way. There are 2 groups. Group 1 has has avoided typecheckers because they're a PITA and group 2 has configured mypy/pyright despite the devx pain. Group 1 is a lot bigger than group 2. Group 2 is more lucrative per unit than group 1.
With enough time, ty and pyrefly will approach perfectness. If they're easy enough to use today, group 1 should be able to adopt them without any extra pain. (some typechecker is better than no typechecker). This gives them momentum. In couple of years, ty/pyrefly may finally be better than mypy/pyright. Then, Group 2 can start their ports.
This way, no one misses out. Group 2 still gets their perfect typechecker, just not immediately. But in that time, Group 1 is getting familiar with using typecheckers and their sheer size helps build institutional momentum towards typecheckers as an essential piece of any python dev flow.
If A. 'certain class of python problems are permanently solved by typecheckers' and B. 'every python user has some typechecker' become true, then that opens a lot of doors. Today, B is a harder problem than A. I'm guessing that compiled/JIT python will be the next frontier once python typing is solved. Wide typechecker adoption is a blocking requirement for that door to be opened.
No, I'm pretty sure they need to be perfect. If the tool's telling me bad information about types, first I'm going to lose a stupid amount of time debugging to that wrong info, and then swear off the tool forever after I realize it was the tool that was wrong.
Type checking in Python involves guessing. Take a program like
x = [3]
Should the type checker guess that the type of x is list[Any], or list[int], or list[Literal[3]], or something else? Libraries you depend on will pose more difficult versions of this question.
So it's not possible to expect the tool to be "perfect", whatever that means - usually people think of it as meaning it allows all code that they think is idiomatic and reasonable, and disallows all code that could raise a TypeError at runtime. You can only expect the tool to be reasonably good and perhaps to have an opinionated design that matches the way you would like to write Python.
Arguably it's the job of a linter and not a type checker, but if your code just assigns random things to random variables that never get used again, I hope something would point that out to you before it passes code review and merged to main, ideally before it even gets sent for review.
Ignoring the question of why there's some random assignment to x in the first place, where are its type hints? Those were added starting with Python 3.5 via PEP 484 just over 10 years ago and have been added to since then. If our goal is maintainable code at scale, the first thing I'd expect of a type checker is for it to communicate with the user, stating that a) the variable in file foo.py on line 43 is missing a type hint and couldn't be guessed with confidence, and that it's just gonna guess it's a list[int]. For bonus points, the tool could run in interactive mode and ask the user directly. But even without a hint, assuming the variable actually is used later, how does it get used?
It seems like you'd just start with the most strict interpretation, list[Literal[3]], and relax it as it gets used. If it gets fed into a function that doesn't have hints and can't be introspected for whatever reason, relax it to list[int] and then further relax it to list[Any] as necessary. Then depending on the mode the user configured the tool to run in, print nothing, or a warning, or error out or ask the user what to do if running interactively. Some of the config options could be strict, tolerant, and loose. Or maybe enforcing, permissive, and loose. Whatever color we want this bike shed to be.
More advanced tools let the user configure individual issues and their corresponding levels, but that may be considered too many options to give the user, overwhelming the user who then doesn't use the tool.
As far as perfect typing goes, I mean, yeah, ideally, after satisfying the type checker in strict mode, which means no types need to be guessed at, as all variables and other locations without type annotations that need them were reported, or even annotated interactively, or if we're real fancy, automatically with a comment. IMO programs should feel free to edit code directly (as a non-default option) That would mean the program could not throw an an uncaught TypeError, unexpectedly. That's a lot to ask of a dynamically strongly typed language, but I'd settle for it never being utterly wrong.
What that means is if the type checker says the function is returning a string, but of the 200 calls to that function, the type checker throws an error and says ten of those callsites expect it to return a dict, all I'm saying is this hypothetical type checker had better be right about that. Nothing worse than losing hours rooting around in the code looking for something that isn't actually there (or an errant semicolon, but this isn't C)
Also, in the past, many tools were dumped by users for too many false positives. Tools like Astree and RV-Match got adoption by having no or low false positives.