← Back to context

Comment by woodruffw

19 hours ago

No, because Python itself will generate bytecode for packages once you actually import them. uv just defers that to first-import time, but the cost is amortized in any setting where imports are performed over multiple executions.

That sounds like yes? Instead of doing it once at install time, it's done once at first use. It's only once so it's not persistently slower, but that is a perf hit.

My first cynical instinct is to say that this is uv making itself look better by deferring the costs to the application, but it's probably a good trade-off if any significant percentage of the files being compiled might not be used ever so the overall cost is lower if you defer to run time.

  • I think they are making the bet that most modules won't be imported. For example if I install scipy, numpy, Pillow or such: what are the chances that I use a subset of the modules vs literally all of them?

    I would bet on a subset for pretty much any non-trivial package (i.e. larger than one or two user facing modules). And for those trivial packages? Well they are usually small, so the cost is small as well. I'm sure there are exceptions: maybe a single gargantuan module thst consists of autogenerated FFI bindings for some C library or such, but that is likely the minority.

  • > It's only once so it's not persistently slower, but that is a perf hit.

    Sure, but you pay that hit either way. Real-world performance is always usage based: the assumption that uv makes is that people run (i.e. import) packages more often than they install them, so amortizing at the point of the import machinery is better for the mean user.

    (This assumption is not universal, naturally!)

  • Probably for any case where an actual human is doing it. On an image you obviously want to do it at bake time, so I feel default off with a flag would have been a better design decision for pip.

    I just read the thread and use Python, I can't comment on the % speedup attributed to uv that comes from this optimization.

    • Images are a good example where doing it at install-time is probably the best yeah, since every run of the image starts 'fresh', losing the compilation which happened last time the image got started.

      If it was a optional toggle it would probably become best practice to activate compilation in dockerfiles.

    • > On an image you obviously want to do it at bake time

      It seems like tons of people are creating container images with an installer tool and having it do a bunch of installations, rather than creating the image with the relevant Python packages already in place. Hard to understand why.

      For that matter, a pre-baked Python install could do much more interesting things to improve import times than just leaving a forest of `.pyc` files in `__pycache__` folders all over the place.

  • You can change it to compile the bytecode on install with a simple environment variable (which you should do when building docker containers if you want to sacrifice some disk space to decrease initial startup time for your app).

  • you are right. it depends on how often this first start is, if its bad or not..most usecases id guess (total guess, have limited exp with python projects professionally) its not an issue.

That’s actually a negative:

My Docker build generating the byte code saves it to the image, sharing the cost at build time across all image deployments — whereas, building at first execution means that each deployed image instance has to generate its own bytecode!

That’s a massive amplification, on the order of 10-100x.

“Well just tell it to generate bytecode!”

Sure — but when is the default supposed to be better?

Because this sounds like a massive footgun for a system where requests >> deploys >> builds. That is, every service I’ve written in Python for the last decade.