T OR A AMODT Andreas Moshovos Paul Chow Electrical and Computer Engineering University of Toronto Canada The Predictability of.

Slides:



Advertisements
Similar presentations
Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.
Advertisements

Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The University of Texas at Austin *Oregon Microarchitecture.
SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)
Lecture Objectives: 1)Define branch prediction. 2)Draw a state machine for a 2 bit branch prediction scheme 3)Explain the impact on the compiler of branch.
Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Dynamic Branch Prediction
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
Glenn Reinman, Brad Calder, Department of Computer Science and Engineering, University of California San Diego and Todd Austin Department of Electrical.
EECS 470 Branch Prediction Lecture 6 Coverage: Chapter 3.
WCED: June 7, 2003 Matt Ramsay, Chris Feucht, & Mikko Lipasti University of Wisconsin-MadisonSlide 1 of 26 Exploring Efficient SMT Branch Predictor Design.
Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003.
1 Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlation Branches from a Larger Global History CSE 340 Project Presentation.
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
EECC551 - Shaaban #1 lec # 5 Fall Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.
CS 7810 Lecture 10 Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors O. Mutlu, J. Stark, C. Wilkerson, Y.N.
1 Lecture 8: Instruction Fetch, ILP Limits Today: advanced branch prediction, limits of ILP (Sections , )
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
Dynamic Branch Prediction
CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
CS 7810 Lecture 21 Threaded Multiple Path Execution S. Wallace, B. Calder, D. Tullsen Proceedings of ISCA-25 June 1998.
CS 7810 Lecture 9 Effective Hardware-Based Data Prefetching for High-Performance Processors T-F. Chen and J-L. Baer IEEE Transactions on Computers, 44(5)
Arvind and Joel Emer Computer Science and Artificial Intelligence Laboratory M.I.T. Branch Prediction.
1/25 HIPEAC 2008 TurboROB TurboROB A Low Cost Checkpoint/Restore Accelerator Patrick Akl and Andreas Moshovos AENAO Research Group Department of Electrical.
1 Storage Free Confidence Estimator for the TAGE predictor André Seznec IRISA/INRIA.
Revisiting Load Value Speculation:
5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2.
Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.
Analysis of Branch Predictors
Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.
Moshovos © 1 Memory State Compressors for Gigascale Checkpoint/Restore Andreas Moshovos
Implicitly-Multithreaded Processors Il Park and Babak Falsafi and T. N. Vijaykumar Presented by: Ashay Rane Published in: SIGARCH Computer Architecture.
1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.
Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Compiler.
Precomputation- based Prefetching By James Schatz and Bashar Gharaibeh.
1/25 June 28 th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control BranchTap Improving Performance With.
CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
Branch Prediction Perspectives Using Machine Learning Veerle Desmet Ghent University.
1/25 HIPEAC 2008 TurboROB TurboROB A Low Cost Checkpoint/Restore Accelerator Patrick Akl 1 and Andreas Moshovos AENAO Research Group Department of Electrical.
1 Lecture 10: Memory Dependence Detection and Speculation Memory correctness, dynamic memory disambiguation, speculative disambiguation, Alpha Example.
Computer Organization CS224
CS203 – Advanced Computer Architecture
ECE Dept., Univ. Maryland, College Park
CMSC 611: Advanced Computer Architecture
The processor: Pipelining and Branching
Module 3: Branch Prediction
TIME C1 C2 C3 C4 C5 C6 C7 C8 C9 I1 branch decode exec mem wb bubble
So far we have dealt with control hazards in instruction pipelines by:
Lecture: Static ILP, Branch Prediction
Phase Capture and Prediction with Applications
Lecture: Branch Prediction
Dynamic Branch Prediction
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
So far we have dealt with control hazards in instruction pipelines by:
Lecture 10: Branch Prediction and Instruction Delivery
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
Adapted from the slides of Prof
So far we have dealt with control hazards in instruction pipelines by:
Patrick Akl and Andreas Moshovos AENAO Research Group
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
Phase based adaptive Branch predictor: Seeing the forest for the trees
Project Guidelines Prof. Eric Rotenberg.
Presentation transcript:

T OR A AMODT Andreas Moshovos Paul Chow Electrical and Computer Engineering University of Toronto Canada The Predictability of Computations that Produce Unpredictable Outcomes

Aamodt, Moshovos, Chow University of Toronto The Predictability of Computations that Produce Unpredictable Outcomes Outcome-Based Prediction History of Outcomes leading up to Branch “X”: TNTTNTT...NTN... TNTTNTT Why this works: Locality in the outcome stream Next time we encounter X after “TNTTNT” we can predict “T” History Outcome of Branch X

Aamodt, Moshovos, Chow University of Toronto The Predictability of Computations that Produce Unpredictable Outcomes Problem Unpredictable Branches THE Problem. No Outcome-Locality

Aamodt, Moshovos, Chow University of Toronto The Predictability of Computations that Produce Unpredictable Outcomes Operation-Based Prediction Find locality in the computations that produce the outcome bne slt ld add

Aamodt, Moshovos, Chow University of Toronto The Predictability of Computations that Produce Unpredictable Outcomes This Work First work that looks at the fundamental program behaviour that would facilitate operation-based prediction. Related work… –Characterization of slices –Prefetching loads / pre-execution of branches

Aamodt, Moshovos, Chow University of Toronto The Predictability of Computations that Produce Unpredictable Outcomes Ideally... Slice (i.e., slice trace) will always be the same. Slice will contain very few operations spanning large portion of original program. Easy (fast) to pre-compute.

Aamodt, Moshovos, Chow University of Toronto The Predictability of Computations that Produce Unpredictable Outcomes Terminology Lead : earliest instruction in slice Target : branch we want to precompute bne slt ld add

Aamodt, Moshovos, Chow University of Toronto The Predictability of Computations that Produce Unpredictable Outcomes What Should a Slice be? Commited Instructions  32, 64, 128, or 256 window Ignore Control Flow  retain side-effect of JAL on $r31 Memory Dependence  follow resolved load-store dependence: M Restrict # Instructions  R = max 1/4, U = “no restriction” FETCH... COMMIT older

Aamodt, Moshovos, Chow University of Toronto The Predictability of Computations that Produce Unpredictable Outcomes Methodology 12 programs from SPEC2000 Baseline Outcome Prediction Hardware –64K Gshare + 64K bimodal w/ 64K selector –64 entry RAS sim-outorder (SimpleScalar 3.0): –8-way, 128 entry RUU, 64 entry-fetch buffer –64K dual LI, 256K unified L2 –64 entry LSQ –Perfect Memory Disambiguation

Aamodt, Moshovos, Chow University of Toronto The Predictability of Computations that Produce Unpredictable Outcomes Measuring Slice Locality locality(1) = Probability same slice was seen last time. High value of locality(1) indicates that last-operation based slice prediction would work well. locality(N) = Probability same slice seen in last N unique slices.

Aamodt, Moshovos, Chow University of Toronto The Predictability of Computations that Produce Unpredictable Outcomes Measuring Slice Locality Save the FOUR unique, most recent slice traces per static branch (only on misprediction). Each time a mispredicted branch is encountered check whether the slice trace was the most recent, 2nd most recent, etc...

Aamodt, Moshovos, Chow University of Toronto The Predictability of Computations that Produce Unpredictable Outcomes Measuring Slice Locality All results are weighted averages. Result for each static branch weighted proportionally to the number of times the operation-based predictor mispredicted it. Characteristics of branches that cause most mispredictions emphasized.

Aamodt, Moshovos, Chow University of Toronto The Predictability of Computations that Produce Unpredictable Outcomes Unrestricted Slices : 32UM gcc equake ammp bzip Saving ONE slice captures most of locality. Locality Better

Aamodt, Moshovos, Chow University of Toronto The Predictability of Computations that Produce Unpredictable Outcomes Restricted vs. Unrestricted 32RM 32UM gcc equake ammp bzip Most slices have few instructions. Locality Better

Aamodt, Moshovos, Chow University of Toronto The Predictability of Computations that Produce Unpredictable Outcomes Effect of Memory Dependence 64RM 64R gcc equake ammp bzip Tracking Dependence Does Not Affect Locality Much. Locality Better

Aamodt, Moshovos, Chow University of Toronto The Predictability of Computations that Produce Unpredictable Outcomes Window Size gcc equake ammp bzip Locality Better 256RM 128RM 64RM 32RM Locality good even for large windows.

Aamodt, Moshovos, Chow University of Toronto The Predictability of Computations that Produce Unpredictable Outcomes Effect of Selection Context 128RM On Mispredict Always Locality Better gcc equake ammp bzip Focusing on Mispredictions Improves Locality.

Aamodt, Moshovos, Chow University of Toronto The Predictability of Computations that Produce Unpredictable Outcomes Idealized Predictor Lead PC Spawn and execute instantaneously when lead operation is encountered. Store up to 4 slice traces per lead operation

Aamodt, Moshovos, Chow University of Toronto The Predictability of Computations that Produce Unpredictable Outcomes Idealized Predictor Match operations & register dependencies as instructions are fetched. After matching there is usually only one prediction per target, if any (>80% of time)... –Tie-breaker #1: longest lead-target distance. –Tie-breaker #2: most recently detected slice.

Aamodt, Moshovos, Chow University of Toronto The Predictability of Computations that Produce Unpredictable Outcomes Correcting Mispredictions High Coverage of Mispredicted Branches 128RM 64RM 32RM gcc equake ammp bzip

Aamodt, Moshovos, Chow University of Toronto The Predictability of Computations that Produce Unpredictable Outcomes Interaction with Outcome-Based Predictor gcc equake ammp bzip Very Little Destructive Interference 128RM 64RM 32RM

Aamodt, Moshovos, Chow University of Toronto The Predictability of Computations that Produce Unpredictable Outcomes Summary Slice-locality for mispredicted branches –average of 70% for restricted slices on a 64 entry window following load-store dependencies (12 SPEC2000 benchmarks). Accuracy of idealized predictor –74% of mispredicted branches eliminated

Aamodt, Moshovos, Chow University of Toronto The Predictability of Computations that Produce Unpredictable Outcomes Conclusion First work that looks at the fundamental program behaviour, slice-locality, that would facilitate predicting slice traces to pre-execute outcomes. SPEC2000 benchmarks show very high slice-locality for mispredicted branches.