Comment by janjongboom
1 year ago
I've gone down the same path. I love deterministic builds, and I think Docker's biggest fault is that to the average developer a Dockerfile _looks_ deterministic - and it even is for a while (build a container twice in a row on the same machine => same output), but then packages get updated in the package manager, base images get updated w/ the same tag, and when you rebuild a month later you get something completely different. Do that times 40 (the number of containers my team manages) and now fixing containers is a significant part of your job.
So in theory Nix would be perfect. But it's not, because it's so different. Get a tool from a vendor => won't work on Nix. Get an error => impossible to quickly find a solution on the web.
Anyway, out of that frustration I've funded https://www.stablebuild.com. Deterministic builds w/ Docker, but with containers built on Ubuntu, Debian or Alpine. Currently consists of an immutable Docker Hub pull-through cache, full daily copies of the Ubuntu/Debian/Alpine package registries, full daily copies of most popular PPAs, daily copies of the PyPI index (we do a lot of ML), and arbitrary immutable file/URL cache.
So far it's been the best of both worlds in my day job: easy to write, easy to debug, wide software compatibility, and we have seen 0 issues due to non-determinism in containers that we moved over to StableBuild in my day job.
I think this issue is not specific to containers.
I've work many years on bare metal. We did (by requirement) acceptance tests, so we did need deterministic builds, before such thing had even a name, or at least before it was mentioned as much as nowadays.
Redhat has a lot of tooling around versioning of mirrors, channels, releases, updates, etc. But I'm so old that even foreman and spacewalk didn't exist, redhat satellite was out of the budget, and the project was migrating from the first versions of CentOS to Debian.
What I did was simply use DNS + Vhosts (dev, stage, prod + versions) for our own package mirrors, and bash+rsync (and of course, raid+backups), with both, CentOS and Debian (and our project packages).
So we had repos like prod/v1.1.0, stage/v1.1.0, dev/v1.1.0, dev/v2.0.0, dev/2.0.1, etc Allowing us to rebuild things without praying, backport bug fixings with confidence, etc
Feels old and simple, however I think it was the same problem/issue that people gets now (re)building containers.
If you need to be able to produce the same output from the same input, you need the same input.
BTW about stablebuild: nice project!
But also Nix solves more problems than Docker. For example if you need to use different versions of software for different projects. Nix lets you pick and choose the software that is visible in your current environment without having to build a new Docker image for every combination, which leads to a combinatorial explosion of images and is not practical.
But I also agree with all the flaws of Nix people are pointing out here.
I don't have any experience with Nix but regarding stable builds of Docker: we provide Java application, have all dependencies as fixed versions so when doing a release, if someone is not doing anything fishy (re-releasing particular version, which is bad-bad-bad) you will get exactly same binaries on top of the same image (again, considering you are not using `:latest` or somesuch)...
Until someone overwrites or deletes the Docker base image (regularly happens), or when you depend on some packages installed through apt - as you'll get the latest version (impossible to pin those).
I am convinced that any sort of free public service is fundamentally incomapatible with long term reproducible builds. It is simply unfair to expect free service to maintain archives forever and never clean them up, rename itself, or go out of business.
If you want reproducibility, the first step is to copy everything to a storage you control. Luckily, this is pretty cheap nowdays
> Until someone overwrites or deletes the Docker base image (regularly happens)
Any source of that claim?
> or when you depend on some packages installed through apt - as you'll get the latest version (impossible to pin those).
Well... please re-read my previous comment - we do Java thing so we use any JDK base image and then we slap our distribution on top of it (which are mostly fixed-version jars).
Of course if you are after perfection and require additional packages then you can install it via dpgk or somesuch but... do you really need that? What about security implications?
3 replies →
> Anyway, out of that frustration I've funded https://www.stablebuild.com. Deterministic builds w/ Docker, but with containers built on Ubuntu, Debian or Alpine.
Very nice project!
Another option for reproducible container images is https://github.com/reproducible-containers although you may need to cache package downloads yourself, depending on the distro you choose.
Yeah, very similar approach. We did this before, see e.g. https://www.stablebuild.com/blog/create-a-historic-ubuntu-pa... - but then figured everyone needs exactly the same packages cached, so why not set up a generic service for that.
For Debian, Ubuntu, and Arch Linux there are official snapshots available so you don't need to cache package downloads yourself. For example, https://snapshot.debian.org/.
2 replies →
Just pin the dependencies and your mostly fine right?
Yeah, but it's impossible to properly pin w/o running your own mirrors. Anything you install via apt is unpinnable, as old versions get removed when a new version is released; pinning multi-arch Docker base images is impossible because you can only pin on a tag which is not immutable (pinning on hashes is architecture dependent); Docker base images might get deleted (e.g. nvidia-cuda base images); pinning Python dependencies, even with a tool like Poetry is impossible, because people delete packages / versions from PyPI (e.g. jaxlib 0.4.1 this week); GitHub repos get deleted; the list goes on. So you need to mirror every dependency.
> Anything you install via apt is unpinnable, as old versions get removed when a new version is released
Huh, I have never had this issue with apt (Debian/Ubuntu) but frequently with apk/Alpine: The package's latest version this week gets deleted next week.
> apt is unpinnable, as old versions get removed
not necessarily, eg snapshot.debian.org
> pinning on hashes is architecture dependent
can't you pin the multi-arch manifest instead?
I still like StableBuild for protection against package deletion, and mirroring non-pinnable deps
The pricing page for StableBuild says
Free …
Number of Users 1
Number of Users 15GB
Is that a mistake or if not can you explain please?
https://www.stablebuild.com/pricing
Ah, yes, on mobile it shows the wrong pricing table... Copying here while I get it fixed:
Free => Access to all functionality, 1 user, 15GB traffic/month, 1GB of storage for files/URLs. $0
Pro => Unlimited users, 500GB traffic included (overage fees apply), 1TB of storage included. $199/mo
Enterprise => Unlimited users, 2,000GB traffic included (overage fees apply), 3TB of storage included, SAML/SSO. $499/mo
Are you associated with the project?
1 reply →
What is an efficient process to avoid using versions with known vulnerabilities for long times when using a tool like stablebuild?