Presentation is loading. Please wait.

Presentation is loading. Please wait.

Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007.

Similar presentations


Presentation on theme: "Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007."— Presentation transcript:

1 Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007 University of California, Los Angeles

2 Background

3 Register Allocation Assign physical locations to the variables in a program. Registers are fast, but few. Memory is large, but slow. Constraints: variables simultaneously alive must be assigned to different physical locations. If there are not enough registers, some variables must be mapped into memory. These are called spilled variables.

4 Spill Free Register Allocation Instance: program P and K registers Problem: can each of the variables of P be mapped to one of the K registers such that variables simultaneously alive are given different registers?

5 Liveness? Live Range? A variable is alive if it can be used in the future. Live range of a variable is the collection of program points where it is alive. a := 1 b := 2 c := a d := b e := c ret a + e a b c d a e a := d 1) 2) 3) 4) 5) 6) 7)

6 Quiz 1 How many registers? a := 1 b := 2 c := a d := b e := c ret a + e a b c d a a := d 1) 2) 3) 4) 5) 6) 7) Is there a general algorithm? Is this problem in P or NP? a(R1):= 1 b(R2):= 2 c(R1):= a(R1) d(R2):= b(R2) e(R3):= c(R1) a(R1):= d(R2) ret a(R1)+e(R3) e

7 Register Allocation and Graphs SFRA = Graph coloring [Chaitin81] a := 1 b := 2 c := a d := b e := c ret a + e a b c d a SFRA is NP-complete… a b c de e a := d

8 Example a := 1 b := 2 c := a d := b e := c ret a + e a(R1) b(R2) c(R1) d(R3)e(R2) a := d Thee registers: R1, R2 and R3 R1 := 1 R2 := 2 R1 := R1 R3 := R2 R2 := R1 ret R1 + R2 R1 := R3

9 Live Range Splitting Live ranges are split via copy instructions and/or renaming of variables. May reduce the degree of the interference graph. a := 1 b := 1 := b c := 1 := a := c b c aa b c a 1 := 1 b := 1 := b c := 1 := a 2 := c a 2 := a 1 b c a1a1 a2a2 b c a1a1 a2a2 (a)(b)(c)(d)(e)(f)

10 Quiz 2 If I can split live ranges, how many registers? a := 1 b := 2 c := a d := b e := c ret a + e a := d a b c d a e a 1 := 1 b := 2 c := a 1 d := b e := c a 2 := d ret a 2 + e a2a2 a1a1 b c d e

11 Quiz 3 P or NP? Instance: program P, K registers Problem: is there a way to split the live ranges of P so that all its variables can fit into K registers? This problem has polynomial solution! Three independent proofs in 2005: Philip Brisk, WLS’05 Florent Bouchez, INRIA, Master’s thesis Sebastian Hack, CC’06

12 Quiz 4, and a bit of intuition… Is coloring of Circular arc-graphs in P or NP? Is coloring of Interval-graphs in P or NP? b c a d e a b c d e

13 Intuition on Live Range Splitting b c a d e a b c de b c a1a1 d e a2a2 bc d ea1a1 a2a2

14 SSA-Form: the new hope. Static Single Assignment[CFR+91]. Intermediate program representation. Each variable is defined only once. b c d a2a2 a1a1 a1a1 a2a2 a 1 := 1 b := 2 c := a 1 d := b e := c ret a 2 + e b c d e a 2 := d e 1) 2) 3) 4) 5) 6) 7)

15 Polynomial time SFRA [Brisk05,Bouchez05,Hack06]: the interference graph of SSA-form programs is chordal. Chordal graphs can be colored in polynomial time. SFRA has polynomial solution for SSA- form programs. Any program can be converted to SSA- form. The SSA-form program never requires more regs than the original program.

16 Quiz 5: RA in basic blocks A basic block is a sequence of instructions with no branches. How is the interference graph of a SSA-form basic block? Give polynomial time algorithm for register assignment in basic blocks.

17 Too good, but… … real computer architectures are a little too surreal…

18 There are more things in x86, Horatio… The polynomial time register assignment algorithm is too abstract. Some computer architectures are messy: Pre-colored registers Registers of different sizes. Testimony: no publicly available implementation for x86 after two years.

19 Pre-colored registers Some variables must be assigned to particular registers. Ex.: calling conventions, division, etc a := 10; b := 2; R0 := a; R1 := b; call(R0, R1); a := 10; b := 2; AX := a; (AL,AH) := DIV AX, b; d := AL; // quotient r := AH; // remainder Function call (PowerPC)Division (x86)

20 Quiz 6: pre-coloring extension Pre-coloring extension is NP-complete for interval graphs[Biro92] and even for Unit-interval graphs[Marx06]… easy :)difficult :( Is pre-coloring extension of interval graphs in P or NP?

21 Alias Register Allocation Aliased registers can be used independently, or in combination. Ex.: x86, Sun SPARC, MIPS floating point numbers, etc. Ex.: aliased registers in the Pentium: EAXEBXECXEDX AXBX CXDX AHAL BHBL CHCL DHDL 32 bits 16 bits 8 bits

22 Quiz 7: Weighted Coloring ab e d c a b c d e ShipbuildingAlias RA a(23) b(0) c(12) d(3) e(1) a(01) b(2) c(01) d(4) e(3) What is the optimal 1-2-coloring of the graph in the left?

23 Alias Register Allocation Alias Register Allocation is similar to the shipbuilding problem[Gol04, pp 204] Alias Register Allocation is NP- complete[LPP07] for interval graphs. And so is the shipbuilding problem...

24 What can SSA do? The SSA transformation is too weak to handle alias register allocation and programs with pre- colored variables.

25 Register Allocation by Puzzle Solving Polynomial time 1-2-coloring extension with live range splitting.

26 Aliased Register Allocation with Pre-coloring Instance: program P containing variables that are either short or long, 2K available registers, plus a partial function  that associates variables with registers. Long variables are assigned two registers {2i, 2i+1}, 0  i < K, and short variables are assigned one register. Problem: is it possible to extend  so that it constitutes a valid register allocation of P? The register allocator is allowed to split live ranges.

27 In other words… Optimal spill free register allocation. x86, Ultra SPARC, MIPS, PowerPC, … as far as I know, any register based architecture. Heuristics for spilling.

28 Heuristics for spilling? Optimal solution for spill free register allocation. If it is not possible to find an optimal register assignment for program P, variables of P must be stored in memory. Finding the minimum number of variables that must be spilled is NP-complete. Finding the largest K colorable induced subgraph of a chordal graph is NP- complete [Yannakakis87].

29 [PP07] - The Main Ideas Elementary Programs and Elementary graphs. Elementary programs have elementary interference graphs. Any well structured program can be converted to an elementary program. Each connected component of an elementary graph is a clique substitution of P 3.

30 [PP07] - The Main Ideas

31 Elementary Programs P is an elementary program if: 1. P is strict 2. P is in static single assignment form 3. For any variable v of P, LR(v) contains at most one program point outside the basic block that contains def(v) 4. If two variables u,v of P interfere, then either def(u) = def(v), or kill(u) = kill(v) 5. If two variables u,v of P interfere, then either LR(u)  LR(v), or LR(v)  LR(u)

32 (a) Strict program (b) Elementary program

33 Interference graph

34 Clique Substitution of P 3 P 3 is a path with three vertices. P3P3 K2K2 K3K3 P 3 [K 2, K 2, K 3 ] X Clique Y Clique Z Clique

35 Elementary Graphs Definition: G is an elementary graph if and only if every connected component of G is a clique substitution of P 3 Theorem: An elementary program has an elementary interference graph.

36 Aligned 1-2-coloring extension Instance: Graph G with nodes that are either short or long, 2K available colors, plus a partial function  that associates nodes with colors. Long nodes are assigned two colors {2i, 2i+1}, 0  i < K, and short nodes are assigned one. Problem: is it possible to extend  so that it constitutes a valid coloring of G?

37 Graph Hierarchy

38 The Puzzles The Board: The Pieces:

39 From graphs to puzzles Given P X,Y,Z we build a puzzle: Vertex  piece Color  column X-clique  upper row Y-clique  both rows Z-clique  lower row Pre-coloring  some pieces are already on the board Theorem: Aligned 1-2-coloring extension for clique substitutions of P 3 and puzzle solving are equivalent under linear-time reductions

40

41 Rules, Patterns and matches match Don’t match

42 Example Program

43 Our Solution

44 Counter-example 1 Lesson: use a size-2 piece before two size-1 pieces

45 Counter-example 2 Lesson: statements 7-10 must come before statements 11-14

46 Counter-example 3 Lesson: statement 15 must come before statements 11-14

47 Counter-example 4 Lesson: the order in statement 11-14 is crucial

48 Running Complexity Theorem: a puzzle is solvable if, and only if, our program succeeds on the puzzle. Our puzzle solving program runs in linear time.

49 Spilling Visit each puzzle once. If the puzzle is not solvable, then remove some pieces and try to solve again. Each time we remove a piece, we also remove all other pieces that stem from the same variable in the original program. Spill farthest use first.

50 Experimental Results Puzzle solver has been implemented in the LLVM[CV04] framework. Compile C programs to x86 target. Over one million lines of code compiled! We have compared our allocator with LLVM’s default algorithm, and a graph coloring well known heuristics.

51 Benchmarks BenchmarkLoCAsmbtcode ASCI Purple:smg200074,87573,039303,037 SPEC2000:175.vpr70,25352,917173,475 SPEC2000:188.ammp54,33535,567149,245 MallocBench:expresso52,85345,041250,770 SPEC2000:197.parser49,38832,849163,025 SPEC2000:164.gzip39,1578,13046,188 (six more)……… Total409,540286,9001,345,898

52 Types of Puzzles

53 Number of Iterations BenchmarkPuzzlesAvgmaxOnce ASCI Purple:smg200052,7911.33833,822 SPEC2000:175.vpr47,2761.101045,575 SPEC2000:188.ammp33,4281.09928,515 MallocBench:expresso43,7911.06338,925 SPEC2000:197.parser30,8681.05428,992 SPEC2000:164.gzip7,8401.0636,718 (six more)………… Total251,4281.1310213,411

54 Execution Time of Generated Code Data normalized with respect to GCC -02.

55 Conclusion If you want to do register allocation for the Pentium,your problem is to solve a collection of puzzles. Fast compilation time, competitive code quality. Many possible directions for future research.


Download ppt "Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of Hong Kong - Fernando M. Q. Pereira - August 28 th, 2007."

Similar presentations


Ads by Google