Presentation is loading. Please wait.

Presentation is loading. Please wait.

Diverge-Merge Processor (DMP) Hyesoon Kim José A. Joao Onur Mutlu* Yale N. Patt HPS Research Group *Microsoft Research University of Texas at Austin.

Similar presentations


Presentation on theme: "Diverge-Merge Processor (DMP) Hyesoon Kim José A. Joao Onur Mutlu* Yale N. Patt HPS Research Group *Microsoft Research University of Texas at Austin."— Presentation transcript:

1 Diverge-Merge Processor (DMP) Hyesoon Kim José A. Joao Onur Mutlu* Yale N. Patt HPS Research Group *Microsoft Research University of Texas at Austin

2 2 Outline  Predicated Execution  Diverge-Merge Processor (DMP)  Implementation of DMP  Experimental Evaluation  Conclusion

3 3 Predicated Execution Convert control flow dependence to data dependence (normal branch code) CB D A T N p1 = (cond) branch p1, TARGET mov b, 1 jmp JOIN TARGET: mov b, 0 A B C B C D A (predicated code) A B C if (cond) { b = 0; } else { b = 1; } p1 = (cond) (!p1) mov b, 1 (p1) mov b, 0

4 4 Fetch Decode Rename Schedule RegisterRead Execute Benefit of Predicated Execution  Predicated Execution can be high performance and energy-efficient. A B C D A E F Predicated Execution Branch Prediction Pipeline flush!! EDBF nop Fetch Decode Rename Schedule RegisterRead Execute A B A C BA CB D A DCBEAEDCFB A FEDC BAAFBCDE F EDABCFEABCD FED CBA FE DCAB EDC BAFAFBCDE

5 5 Limitations/Problems of Predication  ISA: Predicate registers and predicated instructions Dynamic-Hammock Predication[Klauser ’ 98] can solve this problem but it is only applicable to simple hammocks.  Adaptivity: Static predication is not adaptive to run-time branch behavior. Branch behavior changes based on input set, phase, control-flow path. Wish Branches[Kim ’ 05]  Complex CFG: A large subset of control-flow graphs is not converted to predicated code. Function calls, loops, many instructions inside a region, and complex CFGs Hyperblock[Mahlke ’ 92] cannot adapt to frequently-executed paths dynamically.

6 6 Outline  Predicated Execution  Diverge-Merge Processor (DMP)  Implementation of DMP  Experimental Evaluation  Conclusion

7 7 Diverge-Merge Processor (DMP)  DMP can dynamically predicate complex branches (in addition to simple hammocks).  The compiler identifies Diverge branches Control-flow merge (CFM) points  The microarchitecture decides when and what to predicate dynamically.

8 8 select-µops (φ-nodes in SSA) Dynamic Predication A B C H Klauser et al.[PACT’98]: Dynamic-hammock predication CB H A T N mov R1, 1 jmp JOIN TARGET: mov R1, 0 A B C p1 = (cond) branch p1, TARGET (mov R1, 1) PR10 = 1 (mov R1, 0) PR11 = 0 PR12 = (cond) ? PR11 : PR10 Low-confidence H JOIN: add R5, R1, 1

9 9 Diverge-Merge Processor CB E D F G Frequently executed path Not frequently executed path A C E B H Insert select-µops Diverge Branch CFM point A H

10 10 diverge-branch executed block CFM point Diverge-Merge Processor CB E D F G Frequently executed path Not frequently executed path AAA AAA A H

11 11 Control-Flow Graphs A simple hammock A nested hammock A frequently-hammock A loop A........... non-merging DMP Dynamic Hammock SW pred Wish br. Dual-path

12 12 Dual-path Execution vs. DMP Low-confidence C D E F B D E F A B C D E F path 1path 2 C D E F B path 1path 2 Dual-pathDMP CFM

13 13 Control-Flow Graphs A simple hammock A nested hammock A frequently-hammock A loop A........... non-merging DMP Dynamic- hammock SW pred Wish br. Dual-path sometimes

14 14 Distribution of Mispredicted Branches  66% of mispredicted branches can be dynamically predicated in DMP.

15 15 Distribution of Mispredicted Branches  66% of mispredicted branches can be dynamically predicated in DMP.

16 16 Outline  Predicated Execution  Diverge-Merge Processor (DMP)  Implementation of DMP  Experimental Evaluation  Conclusion

17 17 Fetch Mechanism CB E D F G predicted path A C E B H Diverge Branch CFM point A H Low Confidence Round-robin fetch

18 18 PR21 PR11 PR41 add pr21  pr13, #1 (p1) Dynamic Predication Arch.Phy.M R1 R2PR12 R3PR13 A C E B H branch r0, C add r1  r3, #1 add r4  r1, r3 add r1  r2, # -1 branch pr10,C p1 = pr10 add pr24  pr41, pr13add pr31  pr12, # -1(!p1) Arch.Phy.M R1 R2PR12 R3PR13 PR31 1 1 select-µop pr41 = p1? pr21 : pr31 RAT2 RAT1 Forks RAT, RAS, and GHR PR11

19 19 DMP Support  ISA Support Mark diverge branches/CFM points.  Compiler Support [CGO’07] The compiler identifies diverge branches and the corresponding CFM points.  Hardware Support Confidence estimator Fetch mechanisms Load/store processing Instruction retirement Dynamic predication

20 20 Hardware Complexity Analysis ST-LD Forwarding SW pred. Dual path Select-Uop Gen. Rename Support Front-End Check Flush/no Flush Predicate Registers Confidence Estimator Wish br. Multi path Dyn. ham. DMP

21 21 Outline  Predicated Execution  Diverge-Merge Processor (DMP)  Implementation of DMP  Experimental Evaluation  Conclusion

22 22 Simulation Methodology  12 SPEC 2000 INT, 5 SPEC 95 INT Different input sets for profiling and evaluation  Alpha ISA execution driven simulator  Baseline processor configuration 64KB perceptron predictor/O-GEHL (paper) Minimum 30-cycle branch misprediction penalty 8-wide, 512-entry instruction window 2 KB 12-bit history enhanced JRS confidence estimator  Less aggressive processor (paper)  Power model using Wattch

23 23 Different CFG types

24 24 Performance Improvement

25 25 Energy Consumption

26 26 Outline  Predicated Execution  Diverge-Merge Processor (DMP)  Implementation of DMP  Experimental Evaluation  Conclusion

27 27 Conclusion  DMP introduces the concept of frequently-hammocks and it dynamically predicates complex CFGs.  DMP can overcome the three major limitations of software predication: ISA support, adaptivity, complex CFG.  DMP reduces branch mispredictions energy efficiently 19% performance improvement, 9% less energy  DMP divides the work between the compiler and the microarchitecture: The compiler analyzes the control-flow graphs. The microarchitecture decides when and what to predicate dynamically.

28 Thank You!!

29 Questions?


Download ppt "Diverge-Merge Processor (DMP) Hyesoon Kim José A. Joao Onur Mutlu* Yale N. Patt HPS Research Group *Microsoft Research University of Texas at Austin."

Similar presentations


Ads by Google