CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.

Slides:



Advertisements
Similar presentations
Branch prediction Titov Alexander MDSP November, 2009.
Advertisements

Instruction-Level Parallelism compiler techniques and branch prediction prepared and Instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University March.
Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.
Dynamic Branch Prediction (Sec 4.3) Control dependences become a limiting factor in exploiting ILP So far, we’ve discussed only static branch prediction.
Pipelining and Control Hazards Oct
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.
CPE 631: Branch Prediction Electrical and Computer Engineering University of Alabama in Huntsville Aleksandar Milenkovic,
ILP: Advanced HWCSCE430/830 Instruction-level parallelism: Advanced HW Approaches CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Fall, 2006.
Dynamic Branch Prediction
Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
CPE 631: Branch Prediction Electrical and Computer Engineering University of Alabama in Huntsville Aleksandar Milenkovic,
EECC551 - Shaaban #1 lec # 5 Fall Static Conditional Branch Prediction Branch prediction schemes can be classified into static (at compilation.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 8, 2003 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 7, 2002 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)
EECE476: Computer Architecture Lecture 20: Branch Prediction Chapter extra The University of British ColumbiaEECE 476© 2005 Guy Lemieux.
EECC551 - Shaaban #1 lec # 5 Fall Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.
EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.
EECC551 - Shaaban #1 lec # 7 Fall Hardware Dynamic Branch Prediction Simplest method: –A branch prediction buffer or Branch History Table.
Goal: Reduce the Penalty of Control Hazards
EECC551 - Shaaban #1 lec # 5 Winter Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
Branch Prediction Dimitris Karteris Rafael Pasvantidιs.
COMP381 by M. Hamdi 1 (Recap) Control Hazards. COMP381 by M. Hamdi 2 Control (Branch) Hazard A: beqz r2, label B: label: P: Problem: The outcome.
1 COMP 740: Computer Architecture and Implementation Montek Singh Thu, Feb 19, 2009 Topic: Instruction-Level Parallelism III (Dynamic Branch Prediction)
CSC 4250 Computer Architectures October 27, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation.
Dynamic Branch Prediction
EENG449b/Savvides Lec /25/05 March 24, 2005 Prof. Andreas Savvides Spring g449b EENG 449bG/CPSC 439bG.
CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
1 Lecture 7: Branch prediction Topics: bimodal, global, local branch prediction (Sections )
EECC551 - Shaaban #1 lec # 5 Fall Static Conditional Branch Prediction Branch prediction schemes can be classified into static and dynamic.
1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.
Lecture 4.5 Pipelines – Control Hazards Topics Control Hazards Branch Prediction Misprediction stalls Readings: Appendix C September 2, 2015 CSCE 513 Computer.
Branch Hazards and Static Branch Prediction Techniques
1/24/ :00 PM 1 of 86 Pipelining Chapter 6. 1/24/ :00 PM 2 of 86 Overview of Pipelining Pipelining is an implementation technique in which.
CPE 631 Session 17 Branch Prediction Electrical and Computer Engineering University of Alabama in Huntsville.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
Dynamic Branch Prediction
Instruction-Level Parallelism Dynamic Branch Prediction
Instruction-Level Parallelism and Its Dynamic Exploitation
Computer Organization CS224
CS203 – Advanced Computer Architecture
Concepts and Challenges
Dynamic Branch Prediction
COMP 740: Computer Architecture and Implementation
CS 704 Advanced Computer Architecture
CMSC 611: Advanced Computer Architecture
So far we have dealt with control hazards in instruction pipelines by:
Dynamic Hardware Branch Prediction
CPE 631: Branch Prediction
Branch statistics Branches occur every 4-6 instructions (16-25%) in integer programs; somewhat less frequently in scientific ones Unconditional branches.
Dynamic Branch Prediction
Advanced Computer Architecture
/ Computer Architecture and Design
Control unit extension for data hazards
So far we have dealt with control hazards in instruction pipelines by:
Lecture 10: Branch Prediction and Instruction Delivery
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
Adapted from the slides of Prof
Dynamic Hardware Prediction
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
CPE 631 Lecture 12: Branch Prediction
Presentation transcript:

CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part of Section 3.9

September Reducing Branch Costs The frequency of branches and jumps demands that we also attack stalls arising from control dependencies As we are able to add parallel and multiple parallel units, branching becomes a constraining factor On an n-issue processor, branches will arrive n times faster

September Review of a Branching Optimization Instruction Level Parallelism Branch destination and test known at end of third cycle of execution Branch destination and test known at end of second cycle of execution

September Dynamic Branch Prediction Branch prediction buffer –Simplest scheme –A small memory indexed by the lower portion of the address of the branch instruction Includes a bit that says whether the branch was taken recently or not No other tags Useful only to reduce the branch delay when it its longer than the time to compute the possible target PCs Since we only use low order bits, some other branch instruction could have set the tag –The prediction is a hint that is assumed to be correct, if it turns out wrong, the prediction bit is inverted and stored back

September Dynamic Branch Prediction Branch prediction buffer is a cache The 1 bit scheme has a shortcoming –Even if a branch is almost always taken, we will usually predict incorrectly twice, rather than once, when it is not taken Consider a loop branch that is taken nine times in a row then not taken. What is the prediction accuracy for this branch, assuming the prediction bit for this branch remains in the prediction buffer –Mispredict on the the first and last predictions, as the loop branch was not taken on the first one as is set to 0. Then on the last loop it will not be taken and the prediction will be wrong again. –Down to 80% accuracy here

September Dynamic Branch Prediction To remedy this situation, 2 bit branch prediction schemes are often used. A prediction must miss twice before it is changed. A specialization of a more general scheme that has a n-bit saturating counter for each entry in the prediction buffer. With n bits,we can take on the values 0 to 2 n -1. When the counter is >= ½ of its max value, branch is predicted as taken Count is incremented on a taken branch and decremented on a not taken one 2 bits work almost as well as larger numbers

September The States in a 2 Bit Prediction Scheme

September Branch Prediction Buffer Implemented via a small special cache accessed with the instruction address during the IF pipe stage, or as a pair of bits attached to each block in the instruction cache and fetched with each instruction. If the instruction is a branch and if predicted as taken, fetching begins from the target as soon as the PC is known. Otherwise sequential fetching and executing continue. If prediction is wrong the prediction bits are changed as in the state diagram.

September Branch Prediction Buffer Useful for many pipelines In our five stage pipeline the pipeline finds out whether the branch is taken and what the target of the branch is at roughly the same time as the branch predictor information would have been use (the end of the second stage of the execution of the branch). Therefore, this scheme does not help for our pipeline Next figure shows performance of 2-bit prediction for a given benchmark (between 1-18% mispredictions)

September Prediction accuracy of a 4096 entry 2-bit prediction buffer

September Increasing the size of the buffer does not help much

September Correlating Branch Predictors Branch predictions for integer programs are less accurate These 2 bit schemes use only recent behavior of a single branch to predict the future behavior of that branch Look at other branches rather that just the branch we are trying to predict if (aa==2) aa=0; if (bb==2) bb=0; if (aa!=bb){

September Correlating Branch Predictors MIPS Code DSUBUIR3,R1,#2 BNEZR3,L1;branch b1(aa!=2) DADDR1,R0,R0;aa=0 L1:DSUBUIR3,R2,#2 BNEZR3,L2;branch b2 (bb!=2) DADDR2,R0,R0;bb=0 L2:DSUBUR3,R1,R2 BEQZR3,L3 ;branch b3(aa==bb) Branch b3 is correlated with branches b1 and b2 – if branches b1 and b2 are both not taken then b3 will be taken since they are equal

September Correlating Branch Predictors Branch predictors that use the behavior of other branches to make a prediction are called correlating predictors or two level predictors.

September Correlating Branch Predictors Look at the branches with d = 0,1, and 2 if (d==0) d=1; if (d==1) BNEZR1,L1;branch b1 (d!=0) DADDIU R1,R0,#1;d==0, set d=1 L1:DADDIU R3,R1,#-1 BNEZ R3,L2;branch b2 (d!=1) L2;

September Correlating Branch Predictors Initial value of d d==0? b1Value of d before b2 d==1?b2 0YesNot taken1YesNot taken 1NoTaken1YesNot taken 2NoTaken2NoTaken Possible Execution Sequences If b1 is not taken then b2 will not be taken A 1 bit predictor initialized does not have the capability to take advantage of this

September Correlating Branch Predictors To develop a branch predictor that uses correlation, let every branch have two prediction bits, one prediction assuming the last branch executed was not taken and another prediction bit that is used the the last branch executed was taken. The last branch executed is usually not the same instruction as the branch being predicted, although this can occur.

September Bit Correlation Prediction Prediction BitsPrediction if last branch not taken Prediction if last branch taken NT/NTNT NT/TNTT T/NTTNT T/TTT This is a 1,1 predictor since it uses the behavior of the last branch to choose from among a pair of 1-bit branch predictors An (m,n) predictor uses the last m branches to choose from 2 m branch predictors, each of which is an n bit predictor for a single branch

September (m,n) Predictors Can yield higher prediction rates than the 2 bit scheme and requires only a small amount of additional hardware We can record the global history of the most recent m branches in an m bit shift register, where each bit records whether the branch was taken or not taken The branch prediction buffer can be indexed by using a concatenation of the low order bits from the branch address with the m bit global history. That is the address indexes a row in the prediction buffer and the global buffer chooses among them.

September Fig 14

September Comparison of Predictors – First is non-correlating for 4096 entries, followed by a non-correlating 2 bit predictor with unlimited entries and finally a 2 bit predictor with 2 bits of global history and 1024 entries

September Tournament Predictor for the Alpha 21264

September Fraction of Predictions Coming from the Local Predictor for a Tournament Predictor using SPEC89 Benchmarks

September Branch Target Buffers (Advanced Technique for Instruction Delivery ) Reduce penalty in our 5 stage pipeline –Determine next instruction address to fetch by the end of IF We must know whether an instruction (not yet decoded) is a branch and, if so what the next PC should be If at the end of IF we know the instruction is a branch and we know what the next PC should be, we have zero penalty –A branch prediction cache that stores the predicted address for the next instruction after a branch is called a branch target buffer or branch target cache –For the classic 5 stage pipeline, a branch prediction buffer is accessed during the ID cycle. At the end of ID we know the branch target address (computed in ID), the fall through address (computed during IF), and the prediction

September Branch Target Buffers Reduce penalty in our 5 stage pipeline (continued) –Thus by the end of ID we know enough to fetch the next predicted instruction. –For a branch target buffer, we access the buffer during the IF stage using the instruction address of the fetched instruction (a possible branch) to index the buffer –If we get a hit, then we know the predicted instruction address at the end of the IF cycle, which is one cycle earlier than for the branch prediction buffer –This address is predicted and will be sent out before decoding the instruction. It must be known whether the fetched instruction is predicted as a taken branch

September Fig 3.21 A Branch Target Buffer – The PC of the instruction being fetched is matched against a set of instruction addresses stored in the first column; which represent the addresses of known branches. If the PC matches one of these entries, then the instruction being fetched is a taken branch, and the second field, predicted PC, contains the prediction for the next PC after the branch. Fetching immediately begins at that address.

September Fig 3.22 Steps Involve In Handling an Instruction with a Branch Target Buffer