← Back to context

Comment by kaba0

2 years ago

Parallelism. There might be actions that are not order-independent, and the state of the CPU might result in slightly different binaries, but all are correct.

Why does this matter though? Why does order of compilation result in a different binary?

  • Just some random, made up example: say you want to compile an OOP PL that has interfaces and implementations of that. You discover reachable implementations through static analysis, which is multi-threaded. You might discover implementations A,B,C in any order — but they will get their methods placed in the jump table based on this order. This will trivially result in semantically equivalent, but not binary-equivalent executables.

    Of course there would have been better designs for this toy example, but binary reproducibility is/was usually not of the highest priority historically in most compiler infrastructures, and in some cases it might be a relatively big performance regression to fix, or simply just a too big refactor.

  • Because order of completion of the parallel tasks is not guaranteed, if all tasks write to the same file you might get a different result each time.

> There might be actions that are not order-independent, and the state of the CPU might result in slightly different binaries, but all are correct.

Well no: that's really the thing reproducible packages are showing: there's only one correct binary.

And it's the one that's 100% reproducible.

I'd even say that that's the whole point: there's only one correct binary.

I'll die on the hill that if different binaries are "all correct", then none are: for me they're all useless if they're not reproducible.

And it looks like people working on entire .iso being fully bit-for-bit reproducible are willing to die on that hill too.

  • "Correct" does not mean "reproducible" just because you think lowly of irreproducible builds.

    A binary consisting of foo.o and bar.o is correct whether foo.o was linked before bar.o or vice versa, provided that both foo.o and bar.o were compiled correctly.

  • See my reply to the sibling post — binary reproducibility is not the end goal. It is an important property, and I do agree that most compiler toolchains should strive for that, but e.g. it might not be a priority for, say, a JIT compiler.