2D-Profiling Detecting Input-Dependent Branches with a Single Input Data Set Hyesoon Kim M. Aater Suleman Onur Mutlu Yale N. Patt HPS Research Group The.

Slides:

Advertisements

Similar presentations

Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start,

Advertisements

Discovering and Exploiting Program Phases Timothy Sherwood, Erez Perelman, Greg Hamerly, Suleyman Sair, Brad Calder CSE 231 Presentation by Justin Ma.

Hardware-based Devirtualization (VPC Prediction) Hyesoon Kim, Jose A. Joao, Onur Mutlu ++, Chang Joo Lee, Yale N. Patt, Robert Cohn* ++ *

Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The University of Texas at Austin *Oregon Microarchitecture.

Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.

Dynamic Branch Prediction

Data Marshaling for Multi-Core Architectures M. Aater Suleman Onur Mutlu Jose A. Joao Khubaib Yale N. Patt.

Wrong Path Events and Their Application to Early Misprediction Detection and Recovery David N. Armstrong Hyesoon Kim Onur Mutlu Yale N. Patt University.

Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.

Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.

Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.

1 S. Tallam, R. Gupta, and X. Zhang PACT 2005 Extended Whole Program Paths Sriraman Tallam Rajiv Gupta Xiangyu Zhang University of Arizona.

Bug Isolation via Remote Program Sampling Ben Liblit, Alex Aiken, Alice X.Zheng, Michael I.Jordan Presented by: Xia Cheng.

CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.

Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003.

1 Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlation Branches from a Larger Global History CSE 340 Project Presentation.

VLSI Project Neural Networks based Branch Prediction Alexander ZlotnikMarcel Apfelbaum Supervised by: Michael Behar, Spring 2005.

EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.

Wish Branches A Review of “Wish Branches: Enabling Adaptive and Aggressive Predicated Execution” Russell Dodd - October 24, 2006.

Address-Value Delta (AVD) Prediction Onur Mutlu Hyesoon Kim Yale N. Patt.

Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.

Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.

Catching Accurate Profiles in Hardware Satish Narayanasamy, Timothy Sherwood, Suleyman Sair, Brad Calder, George Varghese Presented by Jelena Trajkovic.

Neural Methods for Dynamic Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University.

CB E D F G Frequently executed path Not frequently executed path Hard to predict path A C E B H Insert select-µops (φ-nodes SSA) Diverge Branch CFM point.

1 Feedback-directed optimizations with estimated edge profiles from hardware event sampling Open64 workshop, CGO 2008 April 6, 2008 Vinodha Ramasamy, Robert.

Diverge-Merge Processor (DMP) Hyesoon Kim José A. Joao Onur Mutlu* Yale N. Patt HPS Research Group *Microsoft Research University of Texas at Austin.

Automated Diagnosis of Software Configuration Errors

Improving the Performance of Object-Oriented Languages with Dynamic Predication of Indirect Jumps José A. Joao *‡ Onur Mutlu ‡* Hyesoon Kim § Rishi Agarwal.

Utility-Based Acceleration of Multithreaded Applications on Asymmetric CMPs José A. Joao * M. Aater Suleman * Onur Mutlu ‡ Yale N. Patt * * HPS Research.

Flexible Reference-Counting-Based Hardware Acceleration for Garbage Collection José A. Joao * Onur Mutlu ‡ Yale N. Patt * * HPS Research Group University.

Waleed Alkohlani 1, Jeanine Cook 2, Nafiul Siddique 1 1 New Mexico Sate University 2 Sandia National Laboratories Insight into Application Performance.

Bug Localization with Machine Learning Techniques Wujie Zheng

P ath & E dge P rofiling Michael Bond, UT Austin Kathryn McKinley, UT Austin Continuous Presented by: Yingyi Bu.

Dept. of Computer and Information Sciences : University of Delaware John Cavazos Department of Computer and Information Sciences University of Delaware.

ACSAC’04 Choice Predictor for Free Mongkol Ekpanyapong Pinar Korkmaz Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Institute.

© 2010 IBM Corporation Code Alignment for Architectures with Pipeline Group Dispatching Helena Kosachevsky, Gadi Haber, Omer Boehm Code Optimization Technologies.

Microprocessor Microarchitecture Instruction Fetch Lynn Choi Dept. Of Computer and Electronics Engineering.

Dynamic Object Sampling for Pretenuring Maria Jump Department of Computer Sciences The University of Texas at Austin Stephen M. Blackburn.

Targeted Path Profiling : Lower Overhead Path Profiling for Staged Dynamic Optimization Systems Rahul Joshi, UIUC Michael Bond*, UT Austin Craig Zilles,

Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Compiler.

Diverge-Merge Processor (DMP) Hyesoon Kim José A. Joao Onur Mutlu* Yale N. Patt HPS Research Group *Microsoft Research University of Texas at Austin.

Predicate Execution 2008/01/10 Presented by Jinho.

Practical Path Profiling for Dynamic Optimizers Michael Bond, UT Austin Kathryn McKinley, UT Austin.

Adaptive GPU Cache Bypassing Yingying Tian *, Sooraj Puthoor†, Joseph L. Greathouse†, Bradford M. Beckmann†, Daniel A. Jiménez * Texas A&M University *,

Targeted Path Profiling : Lower Overhead Path Profiling for Staged Dynamic Optimization Systems Rahul Joshi, UIUC Michael Bond*, UT Austin Craig Zilles,

A Memory-hierarchy Conscious and Self-tunable Sorting Library To appear in 2004 International Symposium on Code Generation and Optimization (CGO ’ 04)

Brian Lukoff Stanford University October 13, 2006.

D A C U C P Speculative Alias Analysis for Executable Code Manel Fernández and Roger Espasa Computer Architecture Department Universitat Politècnica de.

Workload Design: Selecting Representative Program-Input Pairs Lieven Eeckhout Hans Vandierendonck Koen De Bosschere Ghent University, Belgium PACT 2002,

Prophet/Critic Hybrid Branch Prediction B B B

Fast Path-Based Neural Branch Prediction Daniel A. Jimenez Presented by: Ioana Burcea.

Dynamic Branch Prediction

Computer Architecture: Branch Prediction (II) and Predicated Execution

CS5100 Advanced Computer Architecture Advanced Branch Prediction

Outline Motivation Project Goals Methodology Preliminary Results

White-Box Testing.

Chang Joo Lee Hyesoon Kim* Onur Mutlu** Yale N. Patt

EE 382N Guest Lecture Wish Branches

White-Box Testing.

Address-Value Delta (AVD) Prediction

Phase Capture and Prediction with Applications

Dynamic Branch Prediction

Correcting the Dynamic Call Graph Using Control Flow Constraints

Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt

José A. Joao* Onur Mutlu‡ Yale N. Patt*

Adapted from the slides of Prof

rePLay: A Hardware Framework for Dynamic Optimization

Phase based adaptive Branch predictor: Seeing the forest for the trees

Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project

Presentation transcript:

2D-Profiling Detecting Input-Dependent Branches with a Single Input Data Set Hyesoon Kim M. Aater Suleman Onur Mutlu Yale N. Patt HPS Research Group The University of Texas at Austin

2 Motivation  Profile-guided code optimization has become essential for achieving good performance. Run-time behavior  profile-time behavior: Good! Run-time behavior ≠ profile-time behavior: Bad!

3 Motivation  Profiling with one input set is not enough! Because a program can show different behavior with different input data sets Example: Performance of predicated execution is highly dependent on the input data set Because some branches behave differently with different input sets

4 Input-dependent Branches  Definition A branch is input-dependent if its misprediction rate differs by more than some Δ over different input data sets. Inp. 1Inp. 2Inp.1 – Inp. 2 Misprediction rate of Br. X 30%29%1% Misprediction rate of Br. Y 30%5%25% Input-dependent branch  Input-dependent br.  hard-to-predict br.

5 An Example Input-dependent Branch  Example from Gap (SPEC2K): Type checking branch TypHandle Sum (A,B) If ((type(A) == INT) && (type(B) == INT)) { Result = A + B; Return Result; } Return SUM(A, B); Train input set: A&B are integers 90% of the time  misprediction rate: 10% Reference input set: A&B are integers 42% of the time  misprediction rate: 30% (30%-10%)>Δ //input-dependent br

6 Predicated Execution Eliminate hard-to-predict branches but fetch blocks B and C all the time (normal branch code) CB D A T N p1 = (cond) branch p1, TARGET mov b, 1 jmp JOIN TARGET: mov b, 0 A B C B C D A (predicated code) A B C if (cond) { b = 0; } else { b = 1; } p1 = (cond) (!p1) mov b, 1 (p1) mov b, 0

7 Predicated Code Performance vs. Branch Misprediction Rate Normal branch code performs better Predicated code performs better run-time (input B) profile-time (input A)  Converting a branch to predicated code could hurt performance if run-time misprediction rate is lower than profile-time misprediction rate X

8 Predicated Code Performance vs. Input Set 9% -4% -16% 1% Measured on an Itanium-II machine non-predicated Predicated execution loses performance because of input-dependent branches

9 If We Know a Branch is Input-Dependent  May not convert it to predicated code.  May convert it to a wish branch. [Kim et al. Micro ’ 05]  May not perform other compiler optimizations or may perform them less aggressively. Hot-path/trace/superblock-based optimizations [Fisher ’ 81, Pettis ’ 90, Hwu ’ 93, Merten ’ 99]

10 Our Goal Identify input-dependent branches by using a single input set for profiling

11 Talk Outline  Motivation  2D-profiling Mechanism  Experimental Results  Conclusion

12 Key Insight of 2D-profiling Phase behavior in prediction accuracy is a good indicator of input dependence input-dependent input-independent phase 1 phase 2 phase 3

13 Traditional Profiling brA time brB time MEAN pr.Acc(brA)  MEAN pr.Acc(brB) behavior of brA  behavior of brB MEAN pr.Acc(brA) MEAN pr.Acc(brB) pr. Acc

14 2D-profiling brA time brB time MEAN pr.Acc(brA)  MEAN pr.Acc(brB) STD pr.Acc(brA) ≠ STD pr.Acc(brB) behavior of brA ≠ behavior of brB A: input-dependent br, B: input-independent br MEAN pr.Acc(brA) STD pr.Acc(brA) MEAN pr.Acc(brB) STD pr.Acc(brB) pr. Acc

15 Calculate MEAN (brA, brB, …), Standard deviation (brA, brB, …), PAM:Points Above Mean (brA, brB, …) 2D-profiling Mechanism Slice 1Slice 2Slice N… mean Pr.Acc(brA,s1) mean Pr.Acc(brB,s1) mean Pr.Acc(brA,s2) mean Pr.Acc(brB,s2) mean Pr.Acc(brA,sN) mean Pr.Acc(brB,sN)...  The profiler collects branch prediction accuracy information for every static branch over time mean brA time PAM:50% PAM:0% brA brB mean brB slice size = M instructions

16 Input-dependence Tests  STD&PAM-test: Identify branches that have large variations in accuracy over time (phase behavior) STD-test (STD > threshold): Identify branches that have large variations in the prediction accuracy over time PAM-test (PAM > threshold): Filter out branches that pass STD- test due to a few outlier samples  MEAN&PAM-test: Identify branches that have low prediction accuracy and some time-variation in accuracy MEAN-test (MEAN < threshold): Identify branches that have low prediction accuracy PAM-test (PAM > threshold): Identify branches that have some variation in the prediction accuracy over time  A branch is classified as input-dependent if it passes either STD&PAM-test or MEAN&PAM-test

17 Talk Outline  Motivation  2D-profiling Mechanism  Experimental Results  Conclusion

18 Experimental Methodology  Profiler: PIN-binary instrumentation tool  Benchmarks: SPEC 2K INT  Input sets Profiler: Train input set Input-dependent Branches: Reference input set and train/other extra input sets  Input-dependent branch: misprediction rate of the branch changes more than Δ = 5% when input data set changes Different Δ are examined in our TechReport [reference 11].  Branch predictors Profiler: 4KB Gshare, Machine: 4KB Gshare Profiler: 4KB Gshare, Machine: 16KB Perceptron (in paper)

19 Evaluation Metrics  Coverage and Accuracy for input-dependent branches A B Actual Input-dependent br. Predicted Input-dependent br. (2D-profiler) Correctly Predicted Input-dependent br.

20 Input-dependent Branches

21 2D-profiling Results input-dependent branchesinput-independent branches Phase behavior and input-dependence are strongly correlated! 63% 88% 93% 76%

22 The Cost of 2D-profiling?  2D-profiling adds only 1% overhead over modeling the branch predictor in software Using a H/W branch predictor [Conte ’ 96] 54% 1%

23 Conclusion  2D-profiling is a new profiling technique to find input- dependent characteristics by using a single input data set for profiling  2D-profiling uses time-varying information instead of just average data  Phase behavior in prediction accuracy in a profile run  input-dependent  2D-profiling accurately identifies input-dependent branches with very little overhead (1% more than modeling the branch predictor in the profiler)  Applications of 2D-profiling are an open research topic Better predicated code/wish branch generation algorithms Detecting other input-dependent program characteristics

Questions?