Register Allocation and Spilling via Graph Coloring G. J. Chaitin IBM Research, 1982
Motivation n Before the register allocation phase, the compiler assumes that there are an unlimited number of general purpose registers n The symbolic registers must be mapped to real registers in a way that avoids conflicts n Symbolic registers that cannot be mapped to real registers must be spilled to memory n We need an algorithm to map registers with minimal spilling cost
Paper Overview n Register allocation overview n Subsumption algorithm n Interference graph coloring algorithm n Spilling algorithm
Register Allocation Steps 1. Determine which registers are live at any point in the intermediate language (IL) program 2. Build a register interference graph nNodes represent symbolic registers nEdges represent a conflict between symbolic registers 3. Subsumption: eliminate unnecessary register copies 4. Find a 32-coloring of the interference graph 5. Decide which registers to spill if necessary
Subsumption n If the source and destination of a register copy do not interfere, they may be coalesced into a single node n For each register copy in IL, determine whether the registers interfere n If not, coalesce the two nodes into one n After first pass, rewrite IL code n Repeat until no more coalescing is possible
Subsumption Example InstructionsLiveDead A = 1A B = AB B = B + 1 C = BCB D = ADA …C, D AB CD
Subsumption Example InstructionsLiveDead AD = 1AD BC = ADBC BC = BC + 1 …AD, BC ADBC
Finding a 32-Coloring n Each symbolic register is assigned a color representing a real register n If no adjacent nodes have the same color, then the coloring succeeds n Assume that G has a node N with degree < 32 n Then G is 32-colorable iff the reduced graph from which N and all its edges have been omitted is 32- colorable n Algorithm throws away nodes of degree < 32 until all nodes have been removed n Algorithm fails if no node has degree < 32
3-coloring example InstructionsLiveDead A = 1A B = 2B C = 3C ? = AA D = 4D ? = BB ? = CC ? = DD AB CD
Spilling n If the 32-coloring fails, then nodes must be spilled to memory n Spilled registers are stored to memory, then loaded momentarily when their results are needed n Every time spill code is generated, the interference graph must be rebuilt n Usually recoloring succeeds after spilling, but sometimes several passes are required
Spilling n NP-Complete problem n Heuristic: spill the node that minimizes –Cost of spilling / Degree of node n Cost of spilling –(number of definition points + number of use points) * frequency of each point n In some cases, spilled node can be reloaded for an extended interval
Conclusion n The graph coloring and spilling algorithms should produce faster code n The register allocation algorithm is efficient –Graph coloring is (N) –But uses (N 2 ) space
Compile-time Copy Elimination Peter Schnorf Mahadevan Ganapathi John Hennessy Stanford, 1993
Motivation n Single assignment languages simplify dependency checking n Which simplifies automatic detection and exploitation of parallelism n But single-assignment languages require a large number of copies n Previous implementations eliminate copies at runtime n Increased efficiency if copies can be eliminated at compile time
Paper Overview n Single-assignment languages n Code generation n Compile-time copy elimination techniques –Substitution –Pattern matching –Substructure sharing –Substructure targeting n Results – success! –Eliminated all copies in bubble sort
Single-assignment languages n Functional languages (LISP, Haskell, SISAL) n Simpler dependency checking –True dependencies – write, read < b = f(c), a = f(b) –Anti-dependencies – read, write < a = f(b), b = f(c) –Output dependencies – write, write < a = f(b), a = f(c) –Aliasing < caused by pointers, array indexes n To avoid aliasing, all inputs and outputs are passed by value
Example – Swap(A,i,j) n Data flow diagram –Edges transport values –Simple nodes are operations n Pick any feasible node evaluation order at random n Naïve implementation –Each edge has its own memory –Swap uses 5 array copies! n Optimized implementation –Swap array updates are done in- place AElement AReplace Input
Example: BubbleSort(A) n Compound nodes represent control flow n Loops are implemented using recursion to avoid multiple assignment of the iteration variable n Naïve implementation –Bubble sort requires (n 2 ) array copies n Optimized implementation –All array updates are done in place –But parallelism is decreased
Code Generation Overview n Input is from compiler front-end –IF1: intermediate data-flow graph representation n Code generator eliminates copies n Output is in C –Compiled into machine code using an optimized C compiler
Vertical Substitution n If input and output have the same type and size, they can share memory –Updates are done in-place AElement AReplace Input
Horizontal Substitution n If an output has several destinations, the output edges can share memory AElement AReplace Input
Horizontal and Vertical Substitution n Horizontal and vertical substitution can interfere with each other –A node along the substitution chain modifies the shared object before its last use n Edges can be marked as read-only if they are shared and this is not the last use
Horizontal and Vertical Substitution AElement AReplace Input AElement AReplace Input
Interprocedural Substitution n Previous discussion concerned simple nodes that can be analyzed at compiler design time n Information about a function is needed in order to use substitution –Does the function modify an input? –Will an input be chained to an output?
Intersubgraph Substitution n Substitution analysis is done for each construct n Same basic principles
Determining the Evaluation Order n Evaluation order can impact efficiency of substitution n Naïve implementation selects the next node to evaluate at random n Hints tell algorithm which nodes should be evaluated before and after other nodes if possible n Hints are ad hoc?
Pattern Matching n Replace hard-to-optimize pieces of code n Patterns are language-specific n Patterns are detected using “ad hoc” methods
Substructure Sharing n Allow substructures to be referenced without copies n AElement can be treated as a NoOp n Happens after substitution analysis – less important n Same principles as substitution analysis
Substructure Targeting n Allow structures to be built from substructures without copies n Similar to substructure sharing
Results n Compared optimizations versus naïve implementation n Optimization eliminate all copies for bubble sort n Informal comparison to run-time optimizer shows improvements
Results
Conclusions n Substitution, pattern matching and substructure sharing can almost eliminate unnecessary copies in a single assignment language. n Copy elimination no longer has to be done at run-time. n Single assignment languages should be more efficient for parallel programs.