Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.

Similar presentations


Presentation on theme: "CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz."— Presentation transcript:

1 CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

2 8 December 2005 2 Outline Motivation of the Study Register Allocation Problem Classical Methods (Chaitin & Briggs) Optimal Register Allocator Experimental Study

3 8 December 2005 3 Motivation of the Study Challenges of Compilers for Embedded Systems Power consumption, memory space limitations Small set of applications Afford long execution cycles to generate good code quality for various phases instruction selection instruction scheduling register allocation

4 8 December 2005 4 Motivation of the Study (2) Instruction Selection selecting target machine instructions to implement pirimitive IR (Instruction Representation) code instructions changes quality of the code Instruction Scheduling ordering the operations in the compiled code decreases the running time of the compiler

5 8 December 2005 5 Register Allocation Problem assigning program variables into available registers shape runtime performance of a compiled code Failure to provide an efficient register allocation increase in the number of memory accesses increase in code size (effect memory capacity and overall form factor of the device) increase in power consumption (frequent memory visits due to poor register allocation)

6 8 December 2005 6 Register Allocation (2) NP-Complete (Garey & Johnson, 1976) Approaches Graph Coloring Chaitin (1981) Integer Programming Goodwin and Wilken (1996)

7 8 December 2005 7 Graph Coloring Traditional solution to register allocation problem. Graphs are used to show registers Each node represents a register, and an edge connecting these nodes shows that these registers are alive at the same point in the program Such nodes should be colored with different colors

8 8 December 2005 8 Graph Coloring (2) Spilling (lack of registers  variables stored in memory for some or all of its lifetime) Spill cost (runtime cost of a variable for loading from and storing in memory) address computation, memory operation, execution frequency

9 8 December 2005 9 Live Ranges A variable Vi is live at a point p in program if defined above p & not used yet for the last time. Live Range (LRi ) begins with the definition of Vi ends with its last use of Vi LRi & LRj simultaneously live at p  LRi interferes LRj Not stored in the same register. Interference Graph Gı = G(V,E) V = set of individual live ranges E = set of edges that represent interferences

10 8 December 2005 10 int main(){ int a; int b; int i; a=10; b=1; i=0; while (i<=a){ b+=b*i; i++; if (b>=100) break; } return 0; } main: pushl%ebp movl%esp, %ebp subl$24, %esp andl$-16, %esp movl$0, %eax subl%eax, %esp movl$10, -4(%ebp) movl$1, -8(%ebp) movl$0, -12(%ebp).L2: movl-12(%ebp), %eax cmpl-4(%ebp), %eax jle.L4 jmp.L3..... Source Code GaS (GNU Assembler)

11 8 December 2005 11 main: subl $4, t1 (t2) movl t3, t2 movl t2, t3 (t4) subl $24, t2 (t5) andl $-16, t5 (t6) movl $0, t7 subl t7, t6 (t8) movl $10, t4 movl $1, t4 movl $0, t4.L2: movl t4, t7 (t9) cmpl t4, t9..... Extended Representation Interference Graph t8 t9 t11 t1 2 t1 0 t3 t1 3 t7 t6 t5 t2 t1 t1 4 t1 5 t4t4

12 8 December 2005 12 Classical Methods for Register Allocation Register allocator based on Graph Coloring Chaitin’s Heuristic (limitations for diamond graphs) Optimistic Coloring Heuristic (Briggs) Stack-Based Methods

13 8 December 2005 13 Chaitin’s Heuristic Initialize stack S to empty. while(G I   ) do while  v of G 1 such that v 0 < k Pick any vertex v such that v 0 < k Remove v and its edges from G 1 and put v on S. if (G I   ) then Pick a vertex v based on the given Spill Metric Spill the live range associated with v. Remove v and its edges from G I while(S   ) do v = pop(S) Color v with the lowest color not used by any neighbor of v.

14 8 December 2005 14 Chaitin-Briggs Heuristic (OCH) Initialize stack S to empty. while(G I   ) do while  v of G 1 such that v 0 < k Pick any vertex v such that v 0 < k Remove v and its edges from G 1 and put v on S. if (G I   ) then Pick a vertex v based on the given Spill Metric Push v on the stack Remove v and its edges from G I while(S   ) do v = pop(S) Color v with the lowest color not used by any neighbor of v. If node υ cannot be colored, then pick an uncolored node υ to spill, spill it, and restart at step 1

15 8 December 2005 15 Comparison of Chaitin’s Heuristic and OCH Try to find 2 colorings A B C D Chaitin (A spilled, B->r1, C->r2, D->r1) OCH (A->r1, B->r2, C->r1, D->r2)

16 8 December 2005 16 Integer Programming (IP) Compared with graph coloring, IP increases program performance reduces code size The time to solve a register allocation problem can be significant The IP formulation should be as simple as possible

17 8 December 2005 17 Optimal Register Allocator (ORA) ORA uses IP to solve register allocation problem Proposed by Goodwin and Wilkonson (1996) IP model is very complex, because it contains many redundancies Solution of the problem is slow

18 8 December 2005 18 A Faster Optimal Register Allocator “A Faster Optimal Register Allocator” uses IP to solve register allocation problem Fu, Wilken and Goodwin (2005) The proposed approach uses global and local analysis techniques to identify locations where spill and deallocation decisions are unnecessary Uses a simplified IP formulation Faster

19 8 December 2005 19 Basic ORA Model

20 8 December 2005 20 Control Flow Graph and ORA Graphs

21 8 December 2005 21 Basic ORA Model Models register allocation as a set of network graphs Symbolic-register graphs Memory graphs An optimal allocation solution is obtained by selecting a set of graph edges whose costs are minimal Cost = allocation overhead of a decision

22 8 December 2005 22 IP Formulation

23 8 December 2005 23 Redundancy

24 8 December 2005 24 Global Reduction Eliminates unnecessary load, store and deallocation decisions placed at the diverge and merge edges in the live range graphs 80% of the total decisions generated by ORA model

25 8 December 2005 25 Decision Placement

26 8 December 2005 26 Diamond Region Reductions There are 4 reduction techniques which can eliminate unnecessary load, store and deallocation Void region coupling void region coupled decision paired decision Symmetric Decision Selection Jump-Edge Nullification Asymmetric Decision Elimination

27 8 December 2005 27 Local Reduction Examines symbolic registers used in adjacent instructions to identify unnecessary load and deallocation decisions

28 8 December 2005 28 Constraint Reduction Deallocation constraints Must-allocate constraint Single-symbolic constraint Liveness constraint

29 8 December 2005 29 Deallocation Constraints Used to allow a real register to be deallocated from a symbolic register at the deallocation decision location X r sp-1 >= X r sp X r sp-1 represents the allocation state of real register r to symbolic register s before the deallocation constraint p X r sp represents the allocation state after p

30 8 December 2005 30 Must-allocate Constraint Used to ensure a symbolic register must be allocated to a real register at each definition and each use Σ X r sp >=1 For optimal allocation, if no deallocation exists between two must-allocate constraints for a symbolic register, then the second must-allocate constraint is redundant

31 8 December 2005 31 Single-symbolic Constraint Used to ensure a real register can be allocated to at most one symbolic register Σ X r sp <=1 For optimal allocation, if no deallocation exists between two adjacant single-symbolic constraints for a real register, then the first must-allocate constraint is redundant

32 8 December 2005 32 Liveness constraint Used to ensure the liveness of a symbolic register Σ X r sp + X mem sp >=1 X mem sp represents the allocation state of a symbolic register s to memory at the liveness constraint location p

33 8 December 2005 33 Experimental Study Compares graph coloring, ORA and faster ORA For ORA and faster ORA, SPEC CPU2000 and SPEC CPU92 integer benchmark suites are used with a RISC processor

34 8 December 2005 34 SPEC CPU92 Benchmark Functions

35 8 December 2005 35 # decision variables and constraints produced by basic ORA and Faster ORA

36 8 December 2005 36 Dynamic spill-code saved using Faster ORA

37 8 December 2005 37 Dynamic spill code components for SPEC CPU 2000

38 8 December 2005 38 Conclusion Two different solutions to register allocation problem Integer Programming Graph Coloring The formulations and usages of these solutions are shown Faster ORA reduces the number of register allocation IP decision variables compared to the basic IP formulations IP gives better results as compared to graph coloring

39 8 December 2005 39 References G. Chatin and M. Auslender, “Register allocation via coloring,” Computer Languages, 1981 D. Goodwin and K. Wilken, “Optimal and near-optimal global register allocation using 0-1 integer programming,” Software Practice and Experience, 1996 C. Fu, K. Wilken and D. Goodwin, “A Faster Optimal Register Allocator,” Journal of Instruction-Level Parallelism 7, 2005

40 8 December 2005 40 Thank You ANY QUESTIONS??


Download ppt "CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz."

Similar presentations


Ads by Google