Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Winter 2006 Topic 8: Minimum Register Instruction Sequence Problem José Nelson Amaral

Similar presentations


Presentation on theme: "CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Winter 2006 Topic 8: Minimum Register Instruction Sequence Problem José Nelson Amaral"— Presentation transcript:

1 CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Winter 2006 Topic 8: Minimum Register Instruction Sequence Problem José Nelson Amaral http://www.cs.ualberta.ca/~amaral/courses/680

2 CMPUT 680 - Compiler Design and Optimization2 Minimum Register Instruction Sequence (MRIS) Problem Given the Data Dependence Graph G for a basic block, derive an instruction sequence S for G that is optimal in the sense that its register requirement is minimum.

3 CMPUT 680 - Compiler Design and Optimization3 Minimum Register Instruction Sequence (MRIS) Problem Given the Data Dependence Graph G for a basic block, derive an instruction sequence S for G that is optimal in the sense that its register requirement is minimum.

4 CMPUT 680 - Compiler Design and Optimization4 Basic Block Example (1) k := 10 (2) i := 1; (3) if i  k goto (13) (4) t1 := ld(x); (5) t2 := t1 + 4; (6) t3 := t1 * 8; (7) t4 := t1 - 4; (8) t5 := t1 / 2; (9) t6 := t2 * t3; (10) t7 := t4 - t5; (11) t8 := t6 * t7; (12) st(y,t8); (13) i := i + 1; (12) if i  20 goto (3) Three-address code begin k := 10; i := 1; do begin if(i=k) y = (x+4)*(x*8)*((x-4)-x/2) i = i +1 end while i <= 20 end … Source code

5 CMPUT 680 - Compiler Design and Optimization5 Basic Block Example begin k := 10; i := 1; do begin if(i=k) y = (x+4)*(x*8)*((x-4)-x/2) i = i +1 end while i <= 20 end … (4) t1 := ld(x); (5) t2 := t1 + 4; (6) t3 := t1 * 8; (7) t4 := t1 - 4; (8) t5 := t1 / 2; (9) t6 := t2 * t3; (10) t7 := t4 - t5; (11) t8 := t6 * t7; (12) st(y,t8); Source code (1) k := 10 (2) i := 1; (3) if i  k goto (13) (13) i := i + 1; (14) if i  20 goto (3) (15)... B1 B2 B3 B4 B5 Control Flow Graph

6 CMPUT 680 - Compiler Design and Optimization6 Minimum Register Instruction Sequence (MRIS) Problem Given the Data Dependence Graph G for a basic block, derive an instruction sequence S for G that is optimal in the sense that its register requirement is minimum.

7 CMPUT 680 - Compiler Design and Optimization7 Data Dependency Graph (a) t1 := ld(x); (b) t2 := t1 + 4; (c) t3 := t1 * 8; (d) t4 := t1 - 4; (e) t5 := t1 / 2; (f) t6 := t2 * t3; (g) t7 := t4 - t5; (h) t8 := t6 * t7; (i) st(y,t8); B3 a

8 CMPUT 680 - Compiler Design and Optimization8 Data Dependency Graph (a) t1 := ld(x); (b) t2 := t1 + 4; (c) t3 := t1 * 8; (d) t4 := t1 - 4; (e) t5 := t1 / 2; (f) t6 := t2 * t3; (g) t7 := t4 - t5; (h) t8 := t6 * t7; (i) st(y,t8); B3 a bcde

9 CMPUT 680 - Compiler Design and Optimization9 Data Dependency Graph B3 a bcde f (a) t1 := ld(x); (b) t2 := t1 + 4; (c) t3 := t1 * 8; (d) t4 := t1 - 4; (e) t5 := t1 / 2; (f) t6 := t2 * t3; (g) t7 := t4 - t5; (h) t8 := t6 * t7; (i) st(y,t8);

10 CMPUT 680 - Compiler Design and Optimization10 Data Dependency Graph B3 a bcdefg (a) t1 := ld(x); (b) t2 := t1 + 4; (c) t3 := t1 * 8; (d) t4 := t1 - 4; (e) t5 := t1 / 2; (f) t6 := t2 * t3; (g) t7 := t4 - t5; (h) t8 := t6 * t7; (i) st(y,t8);

11 CMPUT 680 - Compiler Design and Optimization11 Data Dependency Graph B3 a bcdefg h (a) t1 := ld(x); (b) t2 := t1 + 4; (c) t3 := t1 * 8; (d) t4 := t1 - 4; (e) t5 := t1 / 2; (f) t6 := t2 * t3; (g) t7 := t4 - t5; (h) t8 := t6 * t7; (i) st(y,t8);

12 CMPUT 680 - Compiler Design and Optimization12 Data Dependency Graph B3 a bcdefg h i (a) t1 := ld(x); (b) t2 := t1 + 4; (c) t3 := t1 * 8; (d) t4 := t1 - 4; (e) t5 := t1 / 2; (f) t6 := t2 * t3; (g) t7 := t4 - t5; (h) t8 := t6 * t7; (i) st(y,t8);

13 CMPUT 680 - Compiler Design and Optimization13 Minimum Register Instruction Sequence (MRIS) Problem Given the Data Dependence Graph G for a basic block, derive an instruction sequence S for G that is optimal in the sense that its register requirement is minimum.

14 CMPUT 680 - Compiler Design and Optimization14 Instruction Sequence a bcdefg h i (a) t1 := ld(x); (b) t2 := t1 + 4; (c) t3 := t1 * 8; (d) t4 := t1 - 4; (e) t5 := t1 / 2; (f) t6 := t2 * t3; (g) t7 := t4 - t5; (h) t8 := t6 * t7; (i) st(y,t8); B3 (a) t1 := ld(x); (d) t4 := t1 - 4; (e) t5 := t1 / 2; (g) t7 := t4 - t5; (b) t2 := t1 + 4; (c) t3 := t1 * 8; (f) t6 := t2 * t3; (h) t8 := t6 * t7; (i) st(y,t8); B3’ (a) t1 := ld(x); (b) t2 := t1 + 4; (c) t3 := t1 * 8; (f) t6 := t2 * t3; (d) t4 := t1 - 4; (e) t5 := t1 / 2; (g) t7 := t4 - t5; (h) t8 := t6 * t7; (i) st(y,t8); B3’’ Instr. Sequence 1 Instr. Sequence 2 Instr. Sequence 3

15 CMPUT 680 - Compiler Design and Optimization15 Minimum Register Instruction Sequence (MRIS) Problem Given the Data Dependence Graph G for a basic block, derive an instruction sequence S for G that is optimal in the sense that its register requirement is minimum.

16 CMPUT 680 - Compiler Design and Optimization16 (a) t1 := ld(x); (d) t4 := t1 - 4; (e) t5 := t1 / 2; (g) t7 := t4 - t5; (c) t3 := t1 * 8; (b) t2 := t1 + 4; (f) t6 := t2 * t3; (h) t8 := t6 * t7; (i) st(y,t8); Instruction Sequence 2 Register Requirement (a) t1 := ld(x); (b) t2 := t1 + 4; (c) t3 := t1 * 8; (d) t4 := t1 - 4; (e) t5 := t1 / 2; (f) t6 := t2 * t3; (g) t7 := t4 - t5; (h) t8 := t6 * t7; (i) st(y,t8); t1 Instruction Sequence 1 t2 t3 t4 t5 t6 t7 t8 t1 t4 t5 t7 t3 t2 t6 t8 Register Requirement = 4Register Requirement = 3

17 CMPUT 680 - Compiler Design and Optimization17 Minimum Register Instruction Sequence (MRIS) Problem Given the Data Dependence Graph G for a basic block, derive an instruction sequence S for G that is optimal in the sense that its register requirement is minimum.

18 CMPUT 680 - Compiler Design and Optimization18 MRIS Heuristic Solution It is easy to find the minimum number of register required by a fixed instruction sequence. Even for a fixed instruction sequence, finding the assignment with minimal spilling for k  3 registers is NP-hard. Thus we will use an heuristic based algorithm, and will be happy with a “good” solution. MRIS can be reduced into the Optimal Code Generation (OCG) or Evaluation Order Determination problems which are known to be NP-hard since 1975.

19 CMPUT 680 - Compiler Design and Optimization19 Researchers zR. Govindarajan (Indian Inst. Of Science) zHongbo Yang (Univ. of Delaware) zChihong Zhang (Conexant) zJose Nelson Amaral (Univ. of Alberta) zGuang R. Gao (Univ. of Delaware)

20 CMPUT 680 - Compiler Design and Optimization20 Motivation zIA-64 style processors yReduce spills in local register allocation phase yReduce Local Register Allocation (LRA) requests in Global Register Allocation (GRA) phase yReduce overall register pressure on a per procedure basis zOut-of-order issue processor yInstruction reordering buffer yRegister renaming

21 CMPUT 680 - Compiler Design and Optimization21 Intuition for Our Solution a bcdefg h i Which nodes can/cannot share registers in this DDG? Could we assign the same register for the value produced by b and f? Yes in any legal sequence, b and f can share the same register because f is the only use of b’s result. How about e and g? Data Dependence Graph

22 CMPUT 680 - Compiler Design and Optimization22 Intuition for Our Solution a bcdefg h i How about nodes f and g? Can nodes f and g share the same register? No, there is no legal sequence in which f and g can share a register because node h needs both values at the same time. Data Dependence Graph

23 CMPUT 680 - Compiler Design and Optimization23 Intuition for Our Solution a bcdefg h i How about nodes c and d? Can nodes c and d share the same register? Depends on the sequence, for c and d to share a register we must either list f before d, or list g before c. Data Dependence Graph

24 CMPUT 680 - Compiler Design and Optimization24 Intuition for Our Solution a bcdefg h i Our intuition is to find sub-sets of nodes that can definitely share a register to inform the instruction sequencing algorithm. Data Dependence Graph

25 CMPUT 680 - Compiler Design and Optimization25 Instruction Lineages a bc de fg h i An instruction lineage is a sequence of instructions in which a single register is passed from instruction to instruction (except for the last). How can we ensure that instructions a, b, f, and h will be able to share the same register? L1 = [a, b, f, h, i) a b f h Data Dependence Graph

26 CMPUT 680 - Compiler Design and Optimization26 Sequencing Edges a bcde fg h i The lineage formation imposed a scheduling restriction in the DDG: the selected heir of a node must be the last node listed among its siblings. L1 = [a, b, f, h, i) Thus the lineage formation inserts sequencing edges in the DDG. Augmented Data Dependence Graph

27 CMPUT 680 - Compiler Design and Optimization27 Node Height a bcde fg h i L1 = [a, b, f, h, i) If the introduction of sequencing edges were to produce a cycle in the DDG, it would be impossible to find a legal instruction sequence. Thus we use the height of the nodes, recomputed after each lineage formation, to select the heir. Ties are broken arbitrarily. Augmented Data Dependence Graph

28 CMPUT 680 - Compiler Design and Optimization28 Lineage Formation a bcde fg h i L1 = [a, b, f, h, i) For the next lineage, the heighest nodes not in a lineage are c, d, e, all with a height of 5. L2 = [c, f) c L3 = [e, g, h) e g L4 = [d, g) d Augmented Data Dependence Graph

29 CMPUT 680 - Compiler Design and Optimization29 Lineage Interference L1 = [a, b, f, h, i) L2 = [c, f) L3 = [e, g, h) L4 = [d, g) Two lineages Lu = [u 1, u 2, …, u m ) and Lv = [v 1, v 2, …, v m ) definitely overlap if: (i) u 1 reaches v n, and (ii) v 1 reaches u m. a bcde fg h i Augmented Data Dependence Graph

30 CMPUT 680 - Compiler Design and Optimization30 Lineage Interference Graph L1 = [a, b, f, h, i) L2 = [c, f) L3 = [e, g, h) L4 = [d, g) a bcde fg h i L1 L3L2 L4 Lineage Interference Graph Augmented Data Dependence Graph Which lineages does lineage L1 definely overlap with? Which lineages Interfere with L1?

31 CMPUT 680 - Compiler Design and Optimization31 Lineage Interference Graph L1 = [a, b, f, h, i) L2 = [c, f) L3 = [e, g, h) L4 = [d, g) a bcde fg h i L1 L3L2 L4 Lineage Interference Graph Augmented Data Dependence Graph Which lineages does lineage L1 definely overlap with? How about lineages L2 and L4? There is no path from c to g.

32 CMPUT 680 - Compiler Design and Optimization32 Lineage Interference Graph L1 = [a, b, f, h, i) L2 = [c, f) L3 = [e, g, h) L4 = [d, g) a bcde fg h i L1 L3L2 L4 Lineage Interference Graph Augmented Data Dependence Graph Which lineages does lineage L1 definely overlap with? Do lineages L3 and L4 interfere? Do lineages L2 and L3 interfere?

33 CMPUT 680 - Compiler Design and Optimization33 Lineage Interference Graph L1 = [a, b, f, h, i) L2 = [c, f) L3 = [e, g, h) L4 = [d, g) a bcde fg h i Augmented Data Dependence Graph Lineage Interference Graph L1 L3L2 L4 Lineages Under the sequencing constraints that we imposed, all the instructions in a lineage can share the same register. Would it be possible for multiple lineages to share the same register? How can we find the minimum number of registers if we consider all legal sequences for the augmented DDG?

34 CMPUT 680 - Compiler Design and Optimization34 Lineage Interference Graph L1 = [a, b, f, h, i) L2 = [c,f) L3 = [e, g, h) L4 = [d, g) a bcde fg h i Augmented Data Dependence Graph Lineage Interference Graph L1 L3L2 L4 Lineages We color the lineage interference graph to find the Heuristic Register Bound for the DDG.

35 CMPUT 680 - Compiler Design and Optimization35 Coloring the Lineage Interference Graph a bcde fg h i Augmented Data Dependence Graph Lineage Interference Graph L1 L3L2 L4 Lineages We color the lineage interference graph to find the Heuristic Register Bound for the DDG. We managed to color the graph with three colors, thus we should be able to find an instruction sequence that uses at most three registers. L1 = [a, b, f, h, i) L2 = [c, f) L3 = [e, g, h) L4 = [d, g)

36 CMPUT 680 - Compiler Design and Optimization36 Sequencing by List Scheduling a bcde fg h i Augmented Data Dependence Graph Lineage Interference Graph L1 L3L2 L4 L1 = [a, b, f, h, i) L2 = [c, f) L3 = [e, g, h) L4 = [d, g) Lineages RA RB RC Registers Sequence

37 CMPUT 680 - Compiler Design and Optimization37 Sequencing by List Scheduling a bcde fg h i Augmented Data Dependence Graph Lineage Interference Graph L1 L3L2 L4 RA RB RC Registers L1 = [a, b, f, h, i) L2 = [c, f) L3 = [e, g, h) L4 = [d, g) Lineages a Sequence

38 CMPUT 680 - Compiler Design and Optimization38 Sequencing by List Scheduling a bcde fg h i Augmented Data Dependence Graph Lineage Interference Graph L1 L3L2 L4 RA RB RC Registers L1 = [a, b, f, h, i) L2 = [c, f) L3 = [e, g, h) L4 = [d, g) Lineages ac Sequence

39 CMPUT 680 - Compiler Design and Optimization39 Sequencing by List Scheduling a bcde fg h i Augmented Data Dependence Graph Lineage Interference Graph L1 L3L2 L4 RA RB RC Registers Deadlock!! No Register to list node d!! L1 = [a, b, f, h, i) L2 = [c, f) L3 = [e, g, h) L4 = [d, g) Lineages ace Sequence

40 CMPUT 680 - Compiler Design and Optimization40 Lineage Fusion Condition a bcde fg h i Augmented Data Dependence Graph Lineage Interference Graph L1 L3L2 L4 Two lineages Lu = [u 1, u 2, …, u m ) and Lv = [v 1, v 2, …, v n ) can be fused into a single lineage if: (i) u 1 reaches v n, and (ii) v 1 does not reach u m. L1 = [a, b, f, h, i) L2 = [c,f) L3 = [e, g, h) L4 = [d, g) Lineages

41 CMPUT 680 - Compiler Design and Optimization41 Lineage Fusion Condition L1 = [a, b, f, h, I) L2 = [c, f) L3 = [e, g, h) L4 = [d, g) a bcde fg h i Augmented Data Dependence Graph Lineage Interference Graph L1 L3L2 L4 Lineages Which lineages can be fused in the example? d reaches f, and c does not reach g Thus L4 can be fused with L2 to form L5 = [d, g)  [c, f)

42 CMPUT 680 - Compiler Design and Optimization42 Lineage Fusion L1 = {a, b, f, h, i} L2 = {c, f} L3 = {e, g, h} L4 = {d, g} a bcde fg h i Augmented Data Dependence Graph Lineage Interference Graph L1 L3L2 L4 Lineages When Lu = [u 1, u 2, …, u m ) and Lv = [v 1, v 2, …, v n ) are fused: (1) a scheduling edge from u m to v 1 is introduced in the augmented DDG (2) Lu and Lv are removed from the LIG (3) a new lineage Lw = Lu  Lv is inserted in LIG

43 CMPUT 680 - Compiler Design and Optimization43 Lineage Fusion Condition L1 = [a, b, f, h, I) L3 = [e, g, h) L5 = [d, g)  [c, f) a bcde fg h i Augmented Data Dependence Graph Lineage Interference Graph L1 L3L5 Lineages How many colors we need now to color the LIG? Thus the fusion of L4 with L2 form L5 = [d, g)  [c, f)

44 CMPUT 680 - Compiler Design and Optimization44 Lineage Fusion Condition L1 = [a, b, f, h, I) L3 = [e, g, h) L5 = [d, g)  [c, f) a bcde fg h i Augmented Data Dependence Graph Lineage Interference Graph L1 L3L5 Lineages We still need three colors. Can we find an instruction sequence now?

45 CMPUT 680 - Compiler Design and Optimization45 Sequencing by List Scheduling Lineage Interference Graph RA RB RC Registers L1 L3L5 L1 = [a, b, f, h, I) L3 = [e, g, h) L5 = [d, g)  [c, f) Lineages a bcde fg h i Augmented Data Dependence Graph Sequence

46 CMPUT 680 - Compiler Design and Optimization46 Sequencing by List Scheduling Lineage Interference Graph RA RB RC Registers L1 L3L5 L1 = [a, b, f, h, I) L3 = [e, g, h) L5 = [d, g)  [c, f) Lineages a bcde fg h i Augmented Data Dependence Graph a Sequence

47 CMPUT 680 - Compiler Design and Optimization47 Sequencing by List Scheduling Lineage Interference Graph RA RB RC Registers L1 L3L5 L1 = [a, b, f, h, I) L3 = [e, g, h) L5 = [d, g)  [c, f) Lineages a bcde fg h i Augmented Data Dependence Graph ad Sequence

48 CMPUT 680 - Compiler Design and Optimization48 Sequencing by List Scheduling Lineage Interference Graph RA RB RC Registers L1 L3L5 L1 = [a, b, f, h, I) L3 = [e, g, h) L5 = [d, g)  [c, f) Lineages a bcde fg h i Augmented Data Dependence Graph ade Sequence

49 CMPUT 680 - Compiler Design and Optimization49 Sequencing by List Scheduling Lineage Interference Graph RA RB RC Registers L1 L3L5 L1 = [a, b, f, h, I) L3 = [e, g, h) L5 = [d, g)  [c, f) Lineages a bcde fg h i Augmented Data Dependence Graph adeg Sequence

50 CMPUT 680 - Compiler Design and Optimization50 Sequencing by List Scheduling Lineage Interference Graph RA RB RC Registers L1 L3L5 L1 = [a, b, f, h, I) L3 = [e, g, h) L5 = [d, g)  [c, f) Lineages a bcde fg h i Augmented Data Dependence Graph adegc Sequence

51 CMPUT 680 - Compiler Design and Optimization51 Sequencing by List Scheduling Lineage Interference Graph RA RB RC Registers L1 L3L5 L1 = [a, b, f, h, I) L3 = [e, g, h) L5 = [d, g)  [c, f) Lineages a bcde fg h i Augmented Data Dependence Graph adegcb Sequence

52 CMPUT 680 - Compiler Design and Optimization52 Sequencing by List Scheduling Lineage Interference Graph RA RB RC Registers L1 L3L5 L1 = [a, b, f, h, I) L3 = [e, g, h) L5 = [d, g)  [c, f) Lineages a bcde fg h i Augmented Data Dependence Graph adegcbf Sequence

53 CMPUT 680 - Compiler Design and Optimization53 Sequencing by List Scheduling Lineage Interference Graph RA RB RC Registers L1 L3L5 L1 = [a, b, f, h, I) L3 = [e, g, h) L5 = [d, g)  [c, f) Lineages a bcde fg h i Augmented Data Dependence Graph adegcbfh Sequence

54 CMPUT 680 - Compiler Design and Optimization54 Sequencing by List Scheduling Lineage Interference Graph RA RB RC Registers L1 L3L5 L1 = [a, b, f, h, I) L3 = [e, g, h) L5 = [d, g)  [c, f) Lineages a bcde fg h i Augmented Data Dependence Graph adegcbfhi Sequence

55 CMPUT 680 - Compiler Design and Optimization55 Sequencing by List Scheduling Augmented Data Dependence Graph (a) t1 := ld(x); (d) t4 := t1 - 4; (e) t5 := t1 / 2; (g) t7 := t4 - t5; (c) t3 := t1 * 8; (b) t2 := t1 + 4; (f) t6 := t2 * t3; (h) t8 := t6 * t7; (i) st(y,t8); Instruction Sequence 2 t1 t4 t5 t7 t3 t2 t6 t8 a bcde fg h i RA RB RC Registers adegcbfhi Sequence

56 CMPUT 680 - Compiler Design and Optimization56 Anti- and Output Dependencies a bcde fg h i Augmented Data Dependence Graph (a) RA := ld(x); (d) RC := RA - 4; (e) RB := RA / 2; (g) RC := RC - RB; (c) RB := RA * 8; (b) RA := RA + 4; (f) RA := RA * RB; (h) RA := RA * RC; (i) st(y,RA); Instruction Sequence 2 RA RB RC Registers adegcbfhi Sequence

57 CMPUT 680 - Compiler Design and Optimization57 Summary of Our Solution Method zA “good” construction algorithm for LIG (dynamic) zAn effective heuristic method to calculate the HRB zAn efficient scheduling method (do not backtrack) Form Lineage Interference Graph (LIG) Derive HRB Extended list-scheduling guided by HRB DDG A good instruction sequence

58 Performance: Static Measurements 14 integer registers and 8 FP registers

59 CMPUT 680 - Compiler Design and Optimization59 Performance: Static Measurements 14 integer registers and 8 FP registers

60 CMPUT 680 - Compiler Design and Optimization60 Performance: Dynamic Measurements 14 integer registers and 8 FP registers

61 CMPUT 680 - Compiler Design and Optimization61 Evaluation zFor the MIPS R10K (SPEC95fp), with reduced register sets, the lineage-based algorithm inserted up to 50% less spill operations in the code over the baseline compiler. zUnder the same condition the algorithm generated code up to 6% faster than the baselien (and slightly faster than MIPSPro).

62 CMPUT 680 - Compiler Design and Optimization62 Implementation zDDG construction: use native service routines: e.g. CG_DEP_Compute_Graph zLIG coloring: using native support for set package (e.g. bitset.c) zScheduler implementation: use native support for vector package (e.g. cg_vector.cxx) zAccess dependence graph using native service functions ARC_succs, ARC_preds, ARC_kind

63 CMPUT 680 - Compiler Design and Optimization63 Debugging and Validation zTrace file ytt54:0x1. General trace of LRA ytt45: 0x4. Dependence graph building ytr53. Target Operations (TOP) before LRA ytr54. TOP after LRA

64 CMPUT 680 - Compiler Design and Optimization64 Register Allocation GRA Assign_Registers Fix_LRA_Blues Fail? reschedule local code motion spill global or local registers Succ? LRA: At block level


Download ppt "CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Winter 2006 Topic 8: Minimum Register Instruction Sequence Problem José Nelson Amaral"

Similar presentations


Ads by Google