Comment by edude03

4 days ago

As a heavy container user myself - I've been using containers since I needed to build my own 3.x kernel to test them - docker doesn't solve the reproducibility problem nix solves - IE, I can make a Dockerfile that does `RUN curl foo.com/install.sh` and who knows if that'll work ever again. Nix on the other hand doesn't allow you to do IO during builds[^0] only describe the effect of doing the IO.

[0]: Though apparently darwin (mac) doesn't support sandboxing by default, so you can bypass that but anyway

>who knows if that'll work ever again

Unless you restrict your nix files to specific channel revisions, which when I had to deal with it was poorly documented, and involved searching through specific channel commit hashes in a particularly opaque way, you also can't know that your nix derivations will ever work again.

A number of people on my field used nix as a way to make their research code repositories reproducible, and everything broke within around three years.

  • Yeah that's a ux papercut - pre flakes nixpkgs was always the nixpkgs on your machine. There is docs but if you're not expecting that to happen you wouldn't think to look up the docs.

    • I think expecting everyone to enable a still-experimental feature is more than a papercut.

You can just store the actual container though. Which will reproduce the environment exactly, it's just not a guidebook on how it was built.

The value of most reproducibility at the Dockerfile is that we're actually agnostic to getting a byte-exact reproduction: what we want is the ability to record what was important and effect upgrades.

  • > Which will reproduce the environment exactly, it's just not a guidebook on how it was built.

    By that logic every binary artifact is a "reproducible build". The point of reproducibility isn't just to be able to reproduce the exact same artifact, it's to be able to make changes that have predictable effects.

    > The value of most reproducibility at the Dockerfile is that we're actually agnostic to getting a byte-exact reproduction: what we want is the ability to record what was important and effect upgrades.

    More or less true. But we don't have that, because of what grandparent said; if a Dockerfile used to work and now doesn't, and there's an apt-get update in it, who knows what version it was getting back when it was working, or how to fix the problem?

    • I do get the theoretical annoyance of how it’s technically not reproducible, but in practice most containers are pulled and not built from scratch. If you’re really concerned about that apt-get then besides a container registry you’re going to host a private package repository too, or install a versioned tarball from a public URL, but check the hash of whatever you’re downloading and put that hash in the dockerfile.

      So in practice.. if the build described in the dockerfile breaks, you notice when you’re changing / extending the dockerfile.. which is the time and place where you’d expect to need to know. My guess is that most people complaining about deterministic builds for containers are not using registries for storing images, and are not deploying to platforms like k8s. If your process is, say, shipping dockerfiles to EC2 and building them in situ with “compose up” or something, then of course it won’t be very deterministic and you’re at the mercy of many more network failures, etc

      3 replies →

    • None of this has anything to do with Dockerfile but the tools used within.

      Nix provides the tooling to do reproducible builds. Meanwhile docker is a wrapper around the tools you choose.

      Also just to note, docker does allow you disable network access during builds. Beyond Dockerfile, which is a high level DSL, the underlying tech can do this per build step (in buildkit LLB).

  • I'm not talking about a bit perfect reproduction though, just being able to understand dependencies. Take for example a simple Dockerfile like

    ``` FROM python:latest ADD . RUN pip install foo ```

    If I run this today, and I run this a year from now, I'm going to different versions of `python` and `foo` and there is no way (with just the Dockerfile) to know which version of `foo` and `python` were intended.

    Nix on the other hand, forces me to use a git sha[^0] of my dependency; there is no concept of a mutable input. So to your point it's hard to 'upgrade' from version a -> b in a controlled fashion if you don't know what `a` even was.

    [0]: or the sha256 of the depedency which yes, I understand that's not easy for humans to use.

    • Well, what about "FROM python:3.18" and using requirements.txt or something like that? I mean, running an arbitrary Python version will get you in trouble anyway.

      2 replies →