Comment by amluto
1 day ago
> For example, currently most Debian git repositories base their work in "pristine-tar" branches built from upstream tarball releases
I really wish all the various open source packaging systems would get rid of the concept of source tarballs to the extent possible, especially when those tarballs are not sourced directly from upstream. For example:
- Fedora has a “lookaside cache”, and packagers upload tarballs to it. In theory they come from git as indicated by the source rpm, but I don’t think anything verifies this.
- Python packages build a source tarball. In theory, the new best practice is for a GitHub action to build the package and for a complex mess to attest that really came from GitHub Actions.
- I’ve never made a Debian package, but AFAICT the maintainer kind of does whatever they want.
IMO this is all absurd. If a package hosted by Fedora or Debian or PyPI or crates.io, etc claims to correspond to an upstream git commit or release, then the hosting system should build the package, from the commit or release in question plus whatever package-specific config and patches are needed, and publish that. If it stores a copy of the source, that copy should be cryptographically traceable to the commit in question, which is straightforward: the commit hash is a hash over a bunch of data including the full source!
For lots of software projects, a release tarball is not just a gzipped repo checked out at a specific commit. So this would only work for some packages.
A simple version of this might be a repo with a single file of code in a language that needs compilation, versus, and the tarball with one compiled binary.
Just having a deterministic binary can be non-trivial, let alone a way to confirm "this output came from that source" without recompiling everything again from scratch.
For most well designed projects, a source tarball can be generated cleanly from the source tree. Sure, the canonical build process goes (source tarball) -> artifact, but there’s an alternative build process (source tree) -> artifact that uses the source tarball as an intermediate.
In Python, there is a somewhat clearly defined source tarball. uv build will happily built the source tarball and the wheel from the source tree, and uv build --from <appropriate parameter here> will build the wheel from the source tarball.
And I think it’s disappointing that one uploads source tarballs and wheels to PyPI instead of uploading an attested source tree and having PyPI do the build, at least in simple cases.
In traditional C projects, there’s often some script in the source tree that runs it into the source tarball tree (autogen.sh is pretty common). There is no fundamental reason that a package repository like Debian or Fedora’s couldn’t build from the source tree and even use properly pinned versions of autotools, etc. And it’s really disappointing that the closest widely used thing to a proper C/C++ hermetic build system is Dockerfile, and Dockerfile gets approximately none of the details right. Maybe Nix could do better? C and C++ really need something like Cargo.
The hacker in me is very excited by the prospect of pypi executing code from my packages in the system that builds everyone's wheels.
3 replies →
If it isn't at least a gzip of a subset of the files of a specific commit of a specific repo, someone's definition of "source" would appear to need work.
To get a specific commit from a repo you need to clone usually, which will involve a much bigger download than just downloading your tar file.
5 replies →
> If a package hosted by Fedora or Debian or PyPI or crates.io, etc claims to correspond to an upstream git commit or release, then the hosting system should build the package, from the commit or release in question plus whatever package-specific config and patches are needed, and publish that.
For Debian, that's what tag2upload is doing.
shoutout AUR, I’m trying arch for the first time (Omarchy) and wasn’t planning on using the AUR, but realized how useful it is when 3 of the tools I wanted to try were distributed differently. AUR made it insanely easy… (namely had issues with Obsidian and Google Antigravity)