Prophet/Critic Hybrid Branch Prediction B90902051 B90902124 B90902011.

Slides:

Advertisements

Similar presentations

1/1/ / faculty of Electrical Engineering eindhoven university of technology Speeding it up Part 3: Out-Of-Order and SuperScalar execution dr.ir. A.C. Verschueren.

Advertisements

Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.

1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.

Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

Hardware-based Devirtualization (VPC Prediction) Hyesoon Kim, Jose A. Joao, Onur Mutlu ++, Chang Joo Lee, Yale N. Patt, Robert Cohn* ++ *

Pipelining and Control Hazards Oct

Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.

Dynamic Branch Prediction

Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.

Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.

Wrong Path Events and Their Application to Early Misprediction Detection and Recovery David N. Armstrong Hyesoon Kim Onur Mutlu Yale N. Patt University.

Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)

Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

WCED: June 7, 2003 Matt Ramsay, Chris Feucht, & Mikko Lipasti University of Wisconsin-MadisonSlide 1 of 26 Exploring Efficient SMT Branch Predictor Design.

1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 8, 2003 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)

Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 7, 2002 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)

1 Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlation Branches from a Larger Global History CSE 340 Project Presentation.

1 Applying Perceptrons to Speculation in Computer Architecture Michael Black Dissertation Defense April 2, 2007.

1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)

Computer Organization and Architecture The CPU Structure.

EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE A Survey on BRANCH PREDICTION METHODOLOGY By, Baris Mustafa Kazar Resit Sendag.

VLSI Project Neural Networks based Branch Prediction Alexander ZlotnikMarcel Apfelbaum Supervised by: Michael Behar, Spring 2005.

Wish Branches A Review of “Wish Branches: Enabling Adaptive and Aggressive Predicated Execution” Russell Dodd - October 24, 2006.

1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.

EECC722 - Shaaban #1 Lec # 10 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.

Trace Caches J. Nelson Amaral. Difficulties to Instruction Fetching Where to fetch the next instruction from? – Use branch prediction Sometimes there.

Better Branch Prediction Through Prophet/Critic Hybrids A. Falcón, J. Stark, A. Ramirez, K. Lai, M. Valero Paper Presentation and Discussion.

Branch Target Buffers BPB: Tag + Prediction

1 COMP 740: Computer Architecture and Implementation Montek Singh Thu, Feb 19, 2009 Topic: Instruction-Level Parallelism III (Dynamic Branch Prediction)

Prophet/Critic Hybrid Branch Prediction Falcon, Stark, Ramirez, Lai, Valero Presenter: Christian Wanamaker.

EECC722 - Shaaban #1 Lec # 9 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.

Neural Methods for Dynamic Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University.

Evaluation of Dynamic Branch Prediction Schemes in a MIPS Pipeline Debajit Bhattacharya Ali JavadiAbhari ELE 475 Final Project 9 th May, 2012.

1/25 HIPEAC 2008 TurboROB TurboROB A Low Cost Checkpoint/Restore Accelerator Patrick Akl and Andreas Moshovos AENAO Research Group Department of Electrical.

Improving the Performance of Object-Oriented Languages with Dynamic Predication of Indirect Jumps José A. Joao *‡ Onur Mutlu ‡* Hyesoon Kim § Rishi Agarwal.

Revisiting Load Value Speculation:

Analysis of Branch Predictors

Instruction Issue Logic for High- Performance Interruptible Pipelined Processors Gurinder S. Sohi Professor UW-Madison Computer Architecture Group University.

1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.

Predicting Conditional Branches With Fusion-Based Hybrid Predictors Gabriel H. Loh Yale University Dept. of Computer Science Dana S. Henry Yale University.

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Pipelining Basics.

Coherence Decoupling: Making Use of Incoherence J. Huh, J. Chang, D. Burger, G. Sohi ASPLOS 2004.

Precomputation- based Prefetching By James Schatz and Bashar Gharaibeh.

Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques.

Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt.

1/25 June 28 th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control BranchTap Improving Performance With.

Branch Hazards and Static Branch Prediction Techniques

Computer Structure Advanced Branch Prediction

CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.

Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

Branch Prediction Perspectives Using Machine Learning Veerle Desmet Ghent University.

1/25 HIPEAC 2008 TurboROB TurboROB A Low Cost Checkpoint/Restore Accelerator Patrick Akl 1 and Andreas Moshovos AENAO Research Group Department of Electrical.

Fast Path-Based Neural Branch Prediction Daniel A. Jimenez Presented by: Ioana Burcea.

Value Prediction Kyaw Kyaw, Min Pan Final Project.

Computer Structure Advanced Branch Prediction

Module 3: Branch Prediction

Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2

Lecture: Static ILP, Branch Prediction

EE 382N Guest Lecture Wish Branches

Ka-Ming Keung Swamy D Ponpandi

Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt

Control unit extension for data hazards

Lecture 10: Branch Prediction and Instruction Delivery

Serene Banerjee, Lizy K. John, Brian L. Evans

Patrick Akl and Andreas Moshovos AENAO Research Group

Control unit extension for data hazards

rePLay: A Hardware Framework for Dynamic Optimization

Ka-Ming Keung Swamy D Ponpandi

Presentation transcript:

Prophet/Critic Hybrid Branch Prediction B B B

Outline Motivation Prophet/critic hybrid branch prediction Filtering the critic A prophet/critic hybrid implementation Simulation methodology Evaluation The theory Conclusion

Outline Motivation Prophet/critic hybrid branch prediction Filtering the critic A prophet/critic hybrid implementation Simulation methodology Evaluation The theory Conclusion

Motivation This paper introduces the prophet/critic hybrid conditional branch predictor, which has two component predictors that play the role of either prophet or critic. They are actually a prophecy, or predicted branch future. The critic uses both the branch’s history and future to give a critique of the prophet’s prediction for the current branch.

Motivation (Cont.) Traffic vs. Branch predictor  Taxi as processor  Driver as branch predictor  Passenger as pipeline  Intersections as branches Goal: to navigate the taxi through the system of roads, making the correct turns at intersections, to get to the destination; i.e., the end of the program.

Motivation (cont.) Conventional predictors are analogous to a taxi with just one driver. Prophet/critic hybrids are analogous to a taxi with two drivers: the front-seat and the back- seat.  Prophet is like front-seat driver. He uses his knowledge to decide the way to turn.  Critic is like back-seat driver. she waits until he’s made a few more turns to be certain they are lost.

Motivation (cont.) The way to recover from mistake :  backtrack to the intersection where she believes the wrong-turn was made and try a different direction.

Motivation (cont.) Related works  McFarling first proposed hybrid branch predictors.  Chang et al. proposed a mechanism to classify branches.  Loh and Henry improved prediction accuracy by replacing the selection mechanism traditionally used in hybrids with a fusion mechanism.  Jim’enez et al. explain how predictor overriding provides fast and accurate predictions.  Grunwald et al. show that using the prediction for the current branch in the history of the JRS confidence estimator improves speculation control.

Motivation (cont.) The prophet/critic hybrid combines and builds upon this related work—it is a hybrid, and it does use overriding, and it does use future bits. They claimed that it greatly increases the prediction accuracy for the branch.

Outline Motivation Prophet/critic hybrid branch prediction Filtering the critic A prophet/critic hybrid implementation Simulation methodology Evaluation The theory Conclusion

Current branch predictors make predictions using history information. Once a branch has been predicted, the predictor can not use information from subsequent predictions to re-predict the branch. Prophet/critic hybrid can effectively use these subsequent predictions.

The components of the prophet/critic hybrid can be any existing predictors. Once a prediction for a branch is made, the prophet goes on along the predicted path generating new predictions. This prediction and the new predictions form the branch future. As in current history-based predictors, the prediction of the current branch is determined by the history, and that prediction determines the branch’s future.

While the prophet generates the branch’s future, the critic collects this future information, inserting the prophet predictions into its branch outcome register (BOR). BOR contains two kinds of branch outcomes:  outcomes of branches before the one being predicted, which are branch history, and allow the predictor to correlate on the past, and  outcomes of the branch being predicted and those after it, which are branch future, and allow the predictor to correlate on the future.

The prophet/critic hybrid takes advantage of the fact that the prophet and critic operate autonomously, predicting the same branch at different times. In the prophet/critic hybrid, although both prophet and critic predict the same branch, the predictions are not initiated at the same time. Since the critic initiates its prediction some cycles later, it can incorporate information about future code behavior (i. e., the branch future provided by the prophet) into its prediction.

Updating the predictors Updating branch history registers (BHRs) and BORs.  The prophet BHR is updated speculatively with the prophet’s predicted branch outcome at the prediction stage. BHRs should be speculatively updated instead of waiting for the branches to resolve.  The critic BOR is filled with the predicted outcomes from the prophet. Updating pattern tables.  The prophet and critic pattern tables are updated non- speculatively, at commit time, when the real outcomes of the branches are known.

Recovering from a mispredict When the prophet predicts a branch, a copy of the current BHR and the current BOR are assigned to the branch. If a mispredict is detected for the branch, the BHR and BOR are restored from the values assigned to the branch, the mispredicted branch’s correct outcome is inserted into the BHR and BOR, and fetch is directed to the correct path.

Recovering from a mispredict It is the instructions that follow it that are on the wrong path. Such a branch is not flushed from the pipeline when the mispredict is detected. It continues down the pipeline, and when it commits, it trains the critic using the same BOR value that was used to generate the critic prediction. This BOR value has a single (mispredicted) future bit that corresponds to the branch, plus future bits for the branches on the wrong path that were flushed. If the BOR value did not contain the future bits for the wrong path, the critic would never be trained to recognize when the prophet has mispredicted a branch and gone down the wrong path.

Outline Motivation Prophet/critic hybrid branch prediction Filtering the critic A prophet/critic hybrid implementation Simulation methodology Evaluation The theory Conclusion

Filtering the critic The critic’s accuracy can be limited by multiple branches contending for the same prediction resources; that is, by conflicts. Conflicts can be reduced by filtering easy-to- predict branches from a predictor. Only the difficult branches were predicted with the original predictor, for which conflicts were a problem.

Outline Motivation Prophet/critic hybrid branch prediction Filtering the critic A prophet/critic hybrid implementation Simulation methodology Evaluation The theory Conclusion

Outline Motivation Prophet/critic hybrid branch prediction Filtering the critic A prophet/critic hybrid implementation Simulation methodology Evaluation The theory Conclusion

Use a cycle-accurate IA32, uop based, execution driven simulator that executes Long Instruction Traces (LITs).  A LIT is not actually a trace. But a snapshot of of the processor state, including memory. contains a list of system interrupts needed to simulate system events such as DMA.

Simulated Benchmark Suites

Simulation Parameters

Predictor Simulated Any predictor can play the role of prophet or critic. Predictors they used  Gshare  2Bc-gskew  Percetron

Prophet and critic configurations

Outline Motivation Prophet/critic hybrid branch prediction Filtering the critic A prophet/critic hybrid implementation Simulation methodology Evaluation The theory Conclusion

The importance of future bits.

Adding some future bits always helps, but more is not always better.

The mispredict rate increases when more than 8 future bits are used.

Solution to this problem is to filter the critic.  With a filter, the critic only provides critiques for branches that the prophet previously mispredicted.

Distribution of critiques Ideally, the critic disagrees when the prophet mispredicts (incorrect_disagree). When the critic agrees with the prophet (correct agree if the prophet didn’t mispredict; incorrect agree if it did), the final prediction does not change, so there is neither gain nor harm. In fact?

Increasing the filter size allows more contexts to be stored in its tag table. Increasing the number of future bits is also beneficial to the filter, as it is better able to identify the contexts where the prophet mispredicted a branch.

Processor performance