← Back to context

Comment by yjftsjthsd-h

19 hours ago

> No bytecode compilation by default. pip compiles .py files to .pyc during installation. uv skips this step, shaving time off every install. You can opt in if you want it.

Are we losing out on performance of the actual installed thing, then? (I'm not 100% clear on .pyc files TBH; I'm guessing they speed up start time?)

No, because Python itself will generate bytecode for packages once you actually import them. uv just defers that to first-import time, but the cost is amortized in any setting where imports are performed over multiple executions.

  • That sounds like yes? Instead of doing it once at install time, it's done once at first use. It's only once so it's not persistently slower, but that is a perf hit.

    My first cynical instinct is to say that this is uv making itself look better by deferring the costs to the application, but it's probably a good trade-off if any significant percentage of the files being compiled might not be used ever so the overall cost is lower if you defer to run time.

    • I think they are making the bet that most modules won't be imported. For example if I install scipy, numpy, Pillow or such: what are the chances that I use a subset of the modules vs literally all of them?

      I would bet on a subset for pretty much any non-trivial package (i.e. larger than one or two user facing modules). And for those trivial packages? Well they are usually small, so the cost is small as well. I'm sure there are exceptions: maybe a single gargantuan module thst consists of autogenerated FFI bindings for some C library or such, but that is likely the minority.

    • > It's only once so it's not persistently slower, but that is a perf hit.

      Sure, but you pay that hit either way. Real-world performance is always usage based: the assumption that uv makes is that people run (i.e. import) packages more often than they install them, so amortizing at the point of the import machinery is better for the mean user.

      (This assumption is not universal, naturally!)

      5 replies →

    • Probably for any case where an actual human is doing it. On an image you obviously want to do it at bake time, so I feel default off with a flag would have been a better design decision for pip.

      I just read the thread and use Python, I can't comment on the % speedup attributed to uv that comes from this optimization.

      2 replies →

    • You can change it to compile the bytecode on install with a simple environment variable (which you should do when building docker containers if you want to sacrifice some disk space to decrease initial startup time for your app).

    • you are right. it depends on how often this first start is, if its bad or not..most usecases id guess (total guess, have limited exp with python projects professionally) its not an issue.

  • That’s actually a negative:

    My Docker build generating the byte code saves it to the image, sharing the cost at build time across all image deployments — whereas, building at first execution means that each deployed image instance has to generate its own bytecode!

    That’s a massive amplification, on the order of 10-100x.

    “Well just tell it to generate bytecode!”

    Sure — but when is the default supposed to be better?

    Because this sounds like a massive footgun for a system where requests >> deploys >> builds. That is, every service I’ve written in Python for the last decade.

Yes, uv skipping this step is a one time significant hit to start up time. E.g. if you're building a Dockerfile I'd recommend setting `--compile-bytecode` / `UV_COMPILE_BYTECODE`

Historically the practice of producing pyc files on install started with system wide installed packages, I believe, when the user running the program might lack privileges to write them. If the installer can write the .oy files it can also write the .pyc, while the user running them might not in that location.

This optimization hits serverless Python the worst. At Modal we ensure users of uv are setting UV_COMPILE_BYTECODE to avoid the cold start penalty. For large projects .pyc compilation can take hundreds of milliseconds.

> I'm not 100% clear on .pyc files TBH; I'm guessing they speed up start time?

They do.

> Are we losing out on performance of the actual installed thing, then?

When you consciously precompile Python source files, you can parallelize that process. When you `import` from a `.py` file, you only get that benefit if you somehow coincidentally were already set up for `multiprocessing` and happened to have your workers trying to `import` different files at the same time.

If you have a dependency graph large enough for this to be relevant, it almost certainly includes a large number of files which are never actually imported. At worst the hit to startup time will be equal to the install time saved, and in most cases it'll be a lot smaller.

  • > a large number of files which are never actually imported

    Unfortunately, it typically doesn't work out as well as you might expect, especially given the expectation of putting `import` statements at the top of the file.