Comment by dijksterhuis
6 years ago
These reproducible instructions you speak of are already present in Dockerfiles.
It seems like you're arguing against using docker images, when docker builds solve the very issue you speak of.
Correct me if I'm wrong...?
A Dockerfile is not a reproducible set of build instructions in most cases. I'd guess that the vast majority of Dockerfiles are not reproducible.
Let's look at an example dockerfile for redis (based on [0])
(Note, modified from upstream for this example; won't actually build)
The unreproducible bits are the following:
1. FROM debian:buster-slim -- unreproducible, the base image may change
2. apt-get update && apt-get install -- unreproducible, will give a different version of gcc and other apt packages at different times
Those two bits of unreprodicble-ness are so core to the image, that they result in every other step not being reproducible either.
As a result, when you 'docker build' that over time, it's very unlikely you'll get a bit-for-bit identical redis binary at the other end. Even a minor gcc version change will likely result in a different binary.
As a contrast to this, let's look at a reproducible build of redis using nix. In nixpkgs, it looks like so [1].
If I want a reproducible shell environment, I simply have to pin down its dependencies, which can be done by the following:
If I distribute that nix expression, and say "I ran it with nix version 2.3", that is sufficient for anyone to get a bit-for-bit identical redis binary. Even if the binary cache (which lets me not compile it) were to go away, that nixpkgs revision expresses the build instructions, including the exact version of gcc. Sure, if the binary cache were deleted, it would take multiple hours for everything to compile, but I'd still end up with a bit-for-bit identical copy of redis.
This is true of the majority of nix packages. All commands are run in a sandbox with no access to most of the filesystem or network, encouraging reproducibility. Network access is mediated by special functions (like fetchTarball and fetchGit) which require including a sha256.
All network access going through those specially denoted means of network IO means it's very easy to back up all dependencies (i.e. the redis source code referenced in [1]), and the sha256 means it's easy to use mirrors without having to trust them to be unmodified.
It's possible to make an unreproducible nix package, but it requires going out of your way to do so, and rarely happens in practice. Conversely, it's possible to make a reproducible dockerfile, but it requires going out of your way to do so, and rarely happens in practice.
Oh, and for bonus points, you can build reproduible docker images using nix. This post has a good intro to how to play with that [2].
[0]: https://github.com/docker-library/redis/blob/bfd904a808cf68d...
[1]: https://github.com/NixOS/nixpkgs/blob/a7832c42da266857e98516...
[2]: https://christine.website/blog/i-was-wrong-about-nix-2020-02...
Unless something changed in the months since I have used Nix, this will not get you bit-for-bit reproducible builds. Nix builds its hash tree from the source files of your package and the hashes of its dependencies. The build output is not considered at any step of process.
I was under the impression that Nix also wants to provide bit-for-bit reproducible builds, but that that is a much longer term goal. The immediate value proposition of Nix is ensuring that your source and your dependencies' source are the same.
You're right that nixos / all nix packages isn't/aren't perfectly reproducible.
In practice, most of the packages in the nixos base system seem to be reproducible, as tested here: https://r13y.com/
Naturally, that doesn't prove they are perfectly reproducible, merely that we don't observe unreproducibility.
Nix has tooling, like `nix-build --check`, the sandbox, etc which make it much easier to make things likely to be reproducible.
I'm actually fairly confident that the redis package is reproducible (having run `nix-build --check` on it, and seen it have identical outputs across machines), which is part of why I picked it as my example above.
However, I think my point stands. Dockerfiles make no real attempt to enforce reproducibility, and rarely are reproducible.
Nix packages push you in the right direction, and from practical observation, usually are reproducible.
This is true, but the Nix sandbox does make it a little easier. If you're going for bit-for-bit reproducibility, it has some nice features that help, like normalizing the date, hostname, and so on. And optionally you can use a fixed output derivation where you lock the output to a specific hash.
the focus of nix in the build process is the ideal of if you have three build inputs bash 4, gcc 4.8.<patch>, libc <whatever version> , and the source of the package being the same(hash-wise) , the output is very much(for most cases) going to be the same, since nix itself(even on non-nixos) uses very little of the system stuff, it won't be using the system libc, gcc, bash, ncurses, etc, it will use its own that is lock to a version down to the hash, it follow a target(with exact spec) -> output , where as Dockerfile more resemble, of a build that is output first , and not doing build very often, this is why Nix have their own CI/CD system, Hydra to allow ensure daily or even hourly safety of reproducible builds
Exactly. Basically, if your product needs network access during build, you don't have a reproducible build, and if you don't have a reproducible build, it's only a matter of time before something goes horribly wrong.