← Back to context

Comment by haqreu

2 days ago

Thank you for the reference, I'll check it.

Why not fully maximal SSA + DCE? I mean, other than timings consideration. Fully maximal does not even need any graph traversal, it is entirely local to each block.

I think it will introduce too many redundant phis, but I've never used it in practice - so I can only speculate. I'm not convinced DCE will clean maximal SSA up substantially. Even the classic Cytron et al algorithm must be combined with liveness analysis to avoid placing dead phis (that is, it does not produce pruned SSA by default). In the past, there was always a fear that SSA has the potential to cause quadratic blowup in the number of variables - this concern is mostly theoretical but probably influenced some of the design decision around algorithms for constructing SSA.

Braun et al's algorithm works backwards from uses (which generate liveness), so you get pruned SSA out. In the case of reducible control flow graphs, you also get minimal SSA. This is all without any liveness or dominance computation beforehand. Granted, you may want those things later, but it's nice that you can construct a decent quality SSA with a fairly intuitive algorithm. Also shown in the paper is that you can incorporate a few light optimisations during SSA construction (constant folding, for example).

  • I'll definitely test maximal SSA + DCE in tinyoptimizer when I'll get to it. For the moment I made [1] somewhat-pruned mem2reg pass, not even sure how to call it. But it indeed required to compute dominance frontiers.

    [1] https://ssloy.github.io/tinyoptimizer/mem2reg/

    • Nice.

      I'm always happy to see more accessible resources for compiler writers.

      ---

      As an aside: for displaying CFGs on the page, it would be very interesting to emit something somewhat dynamic. SVGs are always a good start, but there is a neat library for doing hierarchical graph layout (dagre, with d3-dagre handling rendering as well). In my own efforts at pedagogy, I've been interested in producing CFGs in-browser whose basic blocks comprise a "unified diff" view of the block (this being achieved in realtime by maintaining a subset of LLVM whose CFGs are persistent). Then it is more obvious what has changed: at least in the case of mem2reg which shouldn't introduce new blocks or move too much around (I forget if it hoists allocas to the entry block or not).

      It'd also be cool to distil what underlying ideas you have found to be most useful in your efforts. The constrained scope of them may be useful to me, as I've wanted to create a kind of "advent of compilers" for years (advent of code but with a heavy slant towards compiler tasks/algorithms).