← Back to context

Comment by turminal

1 day ago

For lots of software projects, a release tarball is not just a gzipped repo checked out at a specific commit. So this would only work for some packages.

A simple version of this might be a repo with a single file of code in a language that needs compilation, versus, and the tarball with one compiled binary.

Just having a deterministic binary can be non-trivial, let alone a way to confirm "this output came from that source" without recompiling everything again from scratch.

For most well designed projects, a source tarball can be generated cleanly from the source tree. Sure, the canonical build process goes (source tarball) -> artifact, but there’s an alternative build process (source tree) -> artifact that uses the source tarball as an intermediate.

In Python, there is a somewhat clearly defined source tarball. uv build will happily built the source tarball and the wheel from the source tree, and uv build --from <appropriate parameter here> will build the wheel from the source tarball.

And I think it’s disappointing that one uploads source tarballs and wheels to PyPI instead of uploading an attested source tree and having PyPI do the build, at least in simple cases.

In traditional C projects, there’s often some script in the source tree that runs it into the source tarball tree (autogen.sh is pretty common). There is no fundamental reason that a package repository like Debian or Fedora’s couldn’t build from the source tree and even use properly pinned versions of autotools, etc. And it’s really disappointing that the closest widely used thing to a proper C/C++ hermetic build system is Dockerfile, and Dockerfile gets approximately none of the details right. Maybe Nix could do better? C and C++ really need something like Cargo.

  • The hacker in me is very excited by the prospect of pypi executing code from my packages in the system that builds everyone's wheels.

    • This seems no worse than GitHub Actions executing whatever random code people upload.

      It’s not so hard to do a pretty good job, and you can have layers of security. Start with a throwaway VM, which highly competent vendors like AWS will sell you at a somewhat reasonable price. Run as a locked-down unprivileged user inside the container. Then use a tool like gVisor.

      Also… most pure Python packages can, in theory, be built without executing any code. The artifacts just have some files globbed up as configured in pyproject.toml. Unfortunately, the spec defines the process in terms of installing a build backend and then running it, but one could pin a couple of trustworthy build backends versions and constraint them to configurations where they literally just copy things. I think uv-build might be in this category. At the very least I haven’t found any evidence that current uv-build versions can do anything nontrivial unless generation of .pyc files is enabled.

    • Launchpad does this for everything, as does sbuild/buildd in debian land. They generally make it work by both: running the build system in a neutered VM (network access generally not permitted during builds, or limited to only a debian/ubuntu/PPA package mirror), and going to some degree of invasive process/patching to make build systems work without just-in-time network access.

      SUSE and Fedora both do something similar I believe, but I'm not really familiar with the implementation details of those two systems.

      1 reply →

If it isn't at least a gzip of a subset of the files of a specific commit of a specific repo, someone's definition of "source" would appear to need work.

  • To get a specific commit from a repo you need to clone usually, which will involve a much bigger download than just downloading your tar file.

    • Shallow clones are a thing. And it’s fairly straightforward to create a tarball that includes enough hashes to verify the hash chain all the way to the commit hash. (In fact, I once kludged that up several years ago, and maybe I should dust it off. The tarball extracted just like a regular tarball but had all the git objects needed hiding inside in a way that tar would ignore.)

      4 replies →