← Back to context

Comment by ebiederm

19 hours ago

The big caveat in what I was saying is that it is only the pure graph coloring portion that has an optimal linear time algorithm.

Take code in static single assignment form, or another form where it is values that are tracked with live ranges. For a single basic block the interference graph of the live ranges is an interval graph. As the live ranges are just intervals from the instruction that produces the value to the last instruction that consumes the value.

It is annoying but totally doable to follow this into graph theory and see that interval graphs, are a subset of chordal graphs, and these graphs have an algorithm that colors the graph optimally in linear time.

Poke a little more and it turns out that algorithm is the same algorithm as linear-scan register allocation.

Add multiple basic blocks and it is trickier to see, but there are proofs in the academic literature that the interference graph remains an interval graph. Something about phi functions not counting.

Which is what I meant when I said a linear time algorithm.

Compare that to the O(N^2) heuristic from the paper register allocation by graph coloring.

The big change is going from a linear time optimal algorithm to a quadratic time heuristic and proclaiming the quadratic time heuristic is better. (What???)

In practice there are a lot of complications that are not accounted for by graph coloring. Instructions and function calls that take fixed registers. Spilling registers when you need more values alive then there are registers. Deliberately keeping a non-minimal number of values in registers to increase instruction level parallelism.

To the best of my knowledge all of those problems are very tractable if you start with a linear scan register allocator, as the code can be changed without requiring the register allocation to restart.

Code based on computing the interference graph has to reconpute the interference graph and restart whenever the code has to be changed. Making it much worse time wise. If for no other reason than computing the interference graph is O(N^2).

So yes more complications but only because the model described is overly simplified.

There is a wonderful paper "Register Allocation: What Does the NP-Completeness Proof of Chaitin et al. Really Prove?" (2006) by Bouchez, Darte, and Rastello [0]. I'll just quote the abstract in full because there is really nothing much I could add to it:

    Register allocation is one of the most studied problem in compilation. It is
    considered as an NP-complete problem since Chaitin, in 1981, showed that
    assigning temporary variables to k machine registers amounts to color, with
    k colors, the interference graph associated to variables and that this graph
    can be arbitrary, thereby proving the NP-completeness of the problem. However,
    this original proof does not really show where the complexity comes from.
    Recently, the re-discovery that interference graphs of SSA programs can be
    colored in polynomial time raised the question: Can we exploit SSA to perform
    register allocation in polynomial time, without contradicting Chaitin's NP-
    completeness result? To address such a question, we revisit Chaitin's proof
    to better identity the interactions between spilling (load/store insertion),
    coalescing/splitting (moves between registers), critical edges (a property of
    the control-flow graph), and coloring (assignment to registers). In particular,
    we show when it is easy to decide if temporary variables can be assigned to
    k registers or if some spilling is necessary. The real complexity comes from
    critical edges, spilling, and coalescing, which are addressed in our other
    reports.

[0] https://hal-lara.archives-ouvertes.fr/hal-02102286v1/documen...

  • It is worth noting that register coalescing gets a lot of attention in interference graph coloring allocators because they typically have a pre-pass that inserts copies everywhere (because adding copies or spills later is so hard). So instead of worrying about where to insert copies and spills, those allocators worry about what to coalesce instead.

    In my own compiler (where I don't add copies in a prepass) I have not yet found a case where any coalescing would be beneficial.