Comment by zahlman

2 months ago

> the entire python ecosystem generally does not put much emphasis on startup time.

You'd think PyPy would be more popular, then.

> even modules in the standard library will pre-compile regular expressions at import time, even if they're never used, like the "email" module.

Hmm, that is slower than I realized (although still just a fraction of typical module import time):

  $ python -m timeit --setup 'import re' 're.compile("foo.*bar"); re.purge()'
  10000 loops, best of 5: 26.5 usec per loop

  $ python -m timeit --setup 'import sys' 'import re; del sys.modules["re"]'
  500 loops, best of 5: 428 usec per loop

I agree the email module is atrocious in general, which specifically matters because it's used by pip for parsing "compiled" metadata (PKG-INFO in sdists, when present, and METADATA in wheels). The format is intended to look like email headers and be parseable that way; but the RFC mandates all kinds of things that are irrelevant to package metadata, and despite the streaming interface it's hard to actually parse only the things you really need to know.

> Because the python ecosystem doesn't generally optimize for speed (especially startup), the slowdowns end up being contagious. If you import a library that doesn't care about startup time, why should your library care about startup time? The same could maybe be said for memory usage.

I'm trying to fight this, by raising awareness and by choosing my dependencies carefully.

> you first have to have python+dependencies installed before you can install python and your dependencies

It's unusual that you actually need to install Python again after initially having "python+dependencies installed". And pip vendors all its own dependencies except for what's in the standard library. (Which is highly relevant to Debian getting away with the repackaging that it does.)

> I think it's possible to write a pretty fast implementation using python, but you'd need to "greenfield" it by rewriting all of the dependencies yourself so you can optimize startup time and bootstrapping.

This is my current main project btw. (No, I don't really care that uv already exists. I'll have to blog about why.)

> there are _some_ improvements that have happened in the standards/PEPs that should eventually make they're way into pip

Most of them already have, along with other changes. The 2025 pip experience is, believe it or not, much better than the ~2018 pip experience, notwithstanding higher expectations for ecosystem complexity.

7 comments

zahlman

optionalsquid 2 months ago

> You'd think PyPy would be more popular, then.

PyPy is hamstrung by a limited (previously, a lack of) compatibility with compiled Python modules. If it had been a drop-in replacement for the equivalent Python versions, then it'd probably have been much more popular

CamouflagedKiwi 2 months ago

> You'd think PyPy would be more popular, then.

PyPy doesn't do anything to help startup time. In fact, it's typically a bit slower to start up than CPython.

You reap the speed benefits from PyPy once it's been running for a little while and it can JIT compile the hot bits of code.

zahlman 2 months ago
> PyPy doesn't do anything to help startup time. In fact, it's typically a bit slower to start up than CPython.
Considerably slower on my machine. Yes, that was my point. If the community doesn't care about startup time, you'd expect more adoption of an implementation that sacrifices that startup time for later performance.
- CamouflagedKiwi 2 months ago
  
  Oh I see, sorry. I misread it as meaning that you'd expect the community to use PyPy more to alleviate the startup time issue.

collinmanderson 2 months ago

> I agree the email module is atrocious in general

Hah. Yes sounds like we are very much on the same page here. Python stdlib could really use a simple generic email/http header parser.

> It's unusual that you actually need to install Python again after initially having "python+dependencies installed".

I’m thinking about 3rd party installers like poetry, pip-tools, pdm, etc, where your installer needs python+dependencies installed before it can start installing.

> “write a pretty fast implementation using python” This is my current main project btw. (No, I don't really care that uv already exists. I'll have to blog about why.)

Do you have anything public yet? I’m totally curious. I started doing this for flake8 and pip back in 2021/2022, but when ruff+uv came along I figured it wasn’t worth my time any more.

zahlman 2 months ago

> Do you have anything public yet?
The repo is https://github.com/zahlman/paper but it's not really usable and it's missing a bunch of local very unfinished stuff (and my README template definitely needs fixing). More of a "watch this space" but I would really like to push out a Show HN for the first chunk of functionality soon.