CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz
8 December Outline Motivation of the Study Register Allocation Problem Classical Methods (Chaitin & Briggs) Optimal Register Allocator Experimental Study
8 December Motivation of the Study Challenges of Compilers for Embedded Systems Power consumption, memory space limitations Small set of applications Afford long execution cycles to generate good code quality for various phases instruction selection instruction scheduling register allocation
8 December Motivation of the Study (2) Instruction Selection selecting target machine instructions to implement pirimitive IR (Instruction Representation) code instructions changes quality of the code Instruction Scheduling ordering the operations in the compiled code decreases the running time of the compiler
8 December Register Allocation Problem assigning program variables into available registers shape runtime performance of a compiled code Failure to provide an efficient register allocation increase in the number of memory accesses increase in code size (effect memory capacity and overall form factor of the device) increase in power consumption (frequent memory visits due to poor register allocation)
8 December Register Allocation (2) NP-Complete (Garey & Johnson, 1976) Approaches Graph Coloring Chaitin (1981) Integer Programming Goodwin and Wilken (1996)
8 December Graph Coloring Traditional solution to register allocation problem. Graphs are used to show registers Each node represents a register, and an edge connecting these nodes shows that these registers are alive at the same point in the program Such nodes should be colored with different colors
8 December Graph Coloring (2) Spilling (lack of registers variables stored in memory for some or all of its lifetime) Spill cost (runtime cost of a variable for loading from and storing in memory) address computation, memory operation, execution frequency
8 December Live Ranges A variable Vi is live at a point p in program if defined above p & not used yet for the last time. Live Range (LRi ) begins with the definition of Vi ends with its last use of Vi LRi & LRj simultaneously live at p LRi interferes LRj Not stored in the same register. Interference Graph Gı = G(V,E) V = set of individual live ranges E = set of edges that represent interferences
8 December int main(){ int a; int b; int i; a=10; b=1; i=0; while (i<=a){ b+=b*i; i++; if (b>=100) break; } return 0; } main: pushl%ebp movl%esp, %ebp subl$24, %esp andl$-16, %esp movl$0, %eax subl%eax, %esp movl$10, -4(%ebp) movl$1, -8(%ebp) movl$0, -12(%ebp).L2: movl-12(%ebp), %eax cmpl-4(%ebp), %eax jle.L4 jmp.L Source Code GaS (GNU Assembler)
8 December main: subl $4, t1 (t2) movl t3, t2 movl t2, t3 (t4) subl $24, t2 (t5) andl $-16, t5 (t6) movl $0, t7 subl t7, t6 (t8) movl $10, t4 movl $1, t4 movl $0, t4.L2: movl t4, t7 (t9) cmpl t4, t Extended Representation Interference Graph t8 t9 t11 t1 2 t1 0 t3 t1 3 t7 t6 t5 t2 t1 t1 4 t1 5 t4t4
8 December Classical Methods for Register Allocation Register allocator based on Graph Coloring Chaitin’s Heuristic (limitations for diamond graphs) Optimistic Coloring Heuristic (Briggs) Stack-Based Methods
8 December Chaitin’s Heuristic Initialize stack S to empty. while(G I ) do while v of G 1 such that v 0 < k Pick any vertex v such that v 0 < k Remove v and its edges from G 1 and put v on S. if (G I ) then Pick a vertex v based on the given Spill Metric Spill the live range associated with v. Remove v and its edges from G I while(S ) do v = pop(S) Color v with the lowest color not used by any neighbor of v.
8 December Chaitin-Briggs Heuristic (OCH) Initialize stack S to empty. while(G I ) do while v of G 1 such that v 0 < k Pick any vertex v such that v 0 < k Remove v and its edges from G 1 and put v on S. if (G I ) then Pick a vertex v based on the given Spill Metric Push v on the stack Remove v and its edges from G I while(S ) do v = pop(S) Color v with the lowest color not used by any neighbor of v. If node υ cannot be colored, then pick an uncolored node υ to spill, spill it, and restart at step 1
8 December Comparison of Chaitin’s Heuristic and OCH Try to find 2 colorings A B C D Chaitin (A spilled, B->r1, C->r2, D->r1) OCH (A->r1, B->r2, C->r1, D->r2)
8 December Integer Programming (IP) Compared with graph coloring, IP increases program performance reduces code size The time to solve a register allocation problem can be significant The IP formulation should be as simple as possible
8 December Optimal Register Allocator (ORA) ORA uses IP to solve register allocation problem Proposed by Goodwin and Wilkonson (1996) IP model is very complex, because it contains many redundancies Solution of the problem is slow
8 December A Faster Optimal Register Allocator “A Faster Optimal Register Allocator” uses IP to solve register allocation problem Fu, Wilken and Goodwin (2005) The proposed approach uses global and local analysis techniques to identify locations where spill and deallocation decisions are unnecessary Uses a simplified IP formulation Faster
8 December Basic ORA Model
8 December Control Flow Graph and ORA Graphs
8 December Basic ORA Model Models register allocation as a set of network graphs Symbolic-register graphs Memory graphs An optimal allocation solution is obtained by selecting a set of graph edges whose costs are minimal Cost = allocation overhead of a decision
8 December IP Formulation
8 December Redundancy
8 December Global Reduction Eliminates unnecessary load, store and deallocation decisions placed at the diverge and merge edges in the live range graphs 80% of the total decisions generated by ORA model
8 December Decision Placement
8 December Diamond Region Reductions There are 4 reduction techniques which can eliminate unnecessary load, store and deallocation Void region coupling void region coupled decision paired decision Symmetric Decision Selection Jump-Edge Nullification Asymmetric Decision Elimination
8 December Local Reduction Examines symbolic registers used in adjacent instructions to identify unnecessary load and deallocation decisions
8 December Constraint Reduction Deallocation constraints Must-allocate constraint Single-symbolic constraint Liveness constraint
8 December Deallocation Constraints Used to allow a real register to be deallocated from a symbolic register at the deallocation decision location X r sp-1 >= X r sp X r sp-1 represents the allocation state of real register r to symbolic register s before the deallocation constraint p X r sp represents the allocation state after p
8 December Must-allocate Constraint Used to ensure a symbolic register must be allocated to a real register at each definition and each use Σ X r sp >=1 For optimal allocation, if no deallocation exists between two must-allocate constraints for a symbolic register, then the second must-allocate constraint is redundant
8 December Single-symbolic Constraint Used to ensure a real register can be allocated to at most one symbolic register Σ X r sp <=1 For optimal allocation, if no deallocation exists between two adjacant single-symbolic constraints for a real register, then the first must-allocate constraint is redundant
8 December Liveness constraint Used to ensure the liveness of a symbolic register Σ X r sp + X mem sp >=1 X mem sp represents the allocation state of a symbolic register s to memory at the liveness constraint location p
8 December Experimental Study Compares graph coloring, ORA and faster ORA For ORA and faster ORA, SPEC CPU2000 and SPEC CPU92 integer benchmark suites are used with a RISC processor
8 December SPEC CPU92 Benchmark Functions
8 December # decision variables and constraints produced by basic ORA and Faster ORA
8 December Dynamic spill-code saved using Faster ORA
8 December Dynamic spill code components for SPEC CPU 2000
8 December Conclusion Two different solutions to register allocation problem Integer Programming Graph Coloring The formulations and usages of these solutions are shown Faster ORA reduces the number of register allocation IP decision variables compared to the basic IP formulations IP gives better results as compared to graph coloring
8 December References G. Chatin and M. Auslender, “Register allocation via coloring,” Computer Languages, 1981 D. Goodwin and K. Wilken, “Optimal and near-optimal global register allocation using 0-1 integer programming,” Software Practice and Experience, 1996 C. Fu, K. Wilken and D. Goodwin, “A Faster Optimal Register Allocator,” Journal of Instruction-Level Parallelism 7, 2005
8 December Thank You ANY QUESTIONS??