EECE476: Computer Architecture Lecture 20: Branch Prediction Chapter 6.6 + extra The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

Slides:



Advertisements
Similar presentations
Branch prediction Titov Alexander MDSP November, 2009.
Advertisements

Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.
Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.
Dynamic Branch Prediction (Sec 4.3) Control dependences become a limiting factor in exploiting ILP So far, we’ve discussed only static branch prediction.
Pipelining and Control Hazards Oct
Lecture Objectives: 1)Define branch prediction. 2)Draw a state machine for a 2 bit branch prediction scheme 3)Explain the impact on the compiler of branch.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.
CPE 631: Branch Prediction Electrical and Computer Engineering University of Alabama in Huntsville Aleksandar Milenkovic,
Dynamic Branch Prediction
Computer Architecture Computer Architecture Processing of control transfer instructions, part I Ola Flygt Växjö University
Copyright 2001 UCB & Morgan Kaufmann ECE668.1 Adapted from Patterson, Katz and Culler © UCB Csaba Andras Moritz UNIVERSITY OF MASSACHUSETTS Dept. of Electrical.
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
Lecture 3: Branch Prediction Young Cho Graduate Computer Architecture I.
EECE476: Computer Architecture Lecture 21: Faster Branches Branch Prediction with Branch-Target Buffers (not in textbook) The University of British ColumbiaEECE.
1 Lecture 7: Static ILP, Branch prediction Topics: static ILP wrap-up, bimodal, global, local branch prediction (Sections )
EECS 470 Branch Prediction Lecture 6 Coverage: Chapter 3.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 8, 2003 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 7, 2002 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
EECC551 - Shaaban #1 lec # 5 Fall Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.
EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
EECE476: Computer Architecture Lecture 19: Pipelining Reducing Control Hazard Penalty Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy.
Goal: Reduce the Penalty of Control Hazards
EECC551 - Shaaban #1 lec # 5 Winter Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
COMP381 by M. Hamdi 1 (Recap) Control Hazards. COMP381 by M. Hamdi 2 Control (Branch) Hazard A: beqz r2, label B: label: P: Problem: The outcome.
1 COMP 740: Computer Architecture and Implementation Montek Singh Thu, Feb 19, 2009 Topic: Instruction-Level Parallelism III (Dynamic Branch Prediction)
Dynamic Branch Prediction
EENG449b/Savvides Lec /25/05 March 24, 2005 Prof. Andreas Savvides Spring g449b EENG 449bG/CPSC 439bG.
CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
1 Lecture 7: Branch prediction Topics: bimodal, global, local branch prediction (Sections )
ENGS 116 Lecture 91 Dynamic Branch Prediction and Speculation Vincent H. Berk October 10, 2005 Reading for today: Chapter 3.2 – 3.6 Reading for Wednesday:
1 Lecture 7: Static ILP and branch prediction Topics: static speculation and branch prediction (Appendix G, Section 2.3)
1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.
CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.
Lecture 4.5 Pipelines – Control Hazards Topics Control Hazards Branch Prediction Misprediction stalls Readings: Appendix C September 2, 2015 CSCE 513 Computer.
Branch Hazards and Static Branch Prediction Techniques
Branch Prediction Prof. Mikko H. Lipasti University of Wisconsin-Madison Lecture notes based on notes by John P. Shen Updated by Mikko Lipasti.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
Dynamic Branch Prediction
CS203 – Advanced Computer Architecture
Dynamic Branch Prediction
UNIVERSITY OF MASSACHUSETTS Dept
CS 704 Advanced Computer Architecture
Lecture 5 Pipelines – Control Hazards
CMSC 611: Advanced Computer Architecture
So far we have dealt with control hazards in instruction pipelines by:
Dynamic Hardware Branch Prediction
CPE 631: Branch Prediction
Branch statistics Branches occur every 4-6 instructions (16-25%) in integer programs; somewhat less frequently in scientific ones Unconditional branches.
Dynamic Branch Prediction
Lecture 5 Pipelines – Control Hazards
/ Computer Architecture and Design
Pipelining and control flow
So far we have dealt with control hazards in instruction pipelines by:
Lecture 10: Branch Prediction and Instruction Delivery
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
Adapted from the slides of Prof
Dynamic Hardware Prediction
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
CPE 631 Lecture 12: Branch Prediction
Presentation transcript:

EECE476: Computer Architecture Lecture 20: Branch Prediction Chapter extra The University of British ColumbiaEECE 476© 2005 Guy Lemieux

2 Control Hazards Summary We reduced branch/jump penalty to 1 cycle Still have 2 remaining problems Utilization problem –We may fetch the wrong instruction(s) after branch/jump Option 1: stall after every branch/jump Option 2: nullify-if-branch-taken (small performance improvement) Option 3: declare as a “delay slot”, always-execute (avoid) Option 4: new strategies? Forwarding problem –We may depend on result of instruction(s) just before branch Option 1: stall when dependence detected (HDU) Option 2: forward when dependence detected (FDU)

3 New Strategy: Nullify-if-Not-Taken Previously: Nullify-if-taken –Instruction after branch (PC+4) “sneaks” into pipeline –Nullify if branch is taken (T) Observation –Branch has 2 outcomes: taken, not-taken (T or NT) What about nullify-if-NT? –This is another valid strategy –We “sneak” instruction from PC+4+OFFSET Differences? –Can we predict which outcome is more likely? T or NT? –If so, we can “sneak” the right instruction into the pipeline –Reduces frequency of nullify operations

4 Nullify-if-T vs. Nullify-if-NT These are 2 forms of static branch prediction: –Nullify-if-T:always predict NT is likely –Nullify-if-NT:always predict T is likely Main Idea –Predict target where branch is going (T or NT) –Put useful (target) instructions in pipeline after branch –Nullify only if we predict wrong Performance impact? –More accurate predictions  better performance

5 Implementing Static Branch Prediction Simplest static branch prediction –Predict backward branches T, forward branches NT Requires 0 instruction bits More sophisticated static branch prediction –Define two instruction types: BEQ-likely, BEQ-unlikely –For each individual branch, compiler decides if branch is likely or unlikely Requires 1 instruction bit in ISA to encode “likely-vs-unlikely” into each branch

6 Reducing Branch Pipeline Penalty Static Methods Summary 1.Always stall –Works well, wastes CPU cycles. 2.Always execute (delayed branch) –Requires useful instruction to be scheduled by compiler 3.Nullify-if-taken (always predicts branch is NT) –Fetch from PC+4, PC+8, etc –Half of branch-forward instructions are NT –Some performance benefit 4.Nullify-if-not-taken (always predicts branch is T) –Fetch from PC+4+OFFSET, PC+8+OFFSET, etc –Almost all branch-backward instructions are T –Big performance benefit

7 Reducing Branch Pipeline Penalty Dynamic Method 5.Nullify-if-mispredicted Dynamically predict T or NT To do this… –Need branch prediction –Predict direction based upon recent history –Must fetch from predicted direction (target address) Note: no correctness problems arise if we mispredict (only performance) Performance impact? –Depends on “prediction accuracy” –Want >= 80% to be useful Somehow, must implement in ISA –ISA may adopt one of more of above policies for branch instructions –ISA may also adopt multiple policies (eg, multiple versions of same branch instruction)

8 Dynamic Branch Prediction Dynamic: predicted branch direction depends upon recent history –No history? Must guess –Execute same branch many times  History  Need state information to retain history

9 Overview of Dynamic Branch Prediction Schemes Many Types of Dynamic Branch Predictors –Basic 1-bit predictor 2-bit predictor (very good) –Generalization N-bit saturating counter (not very good) –Hybrid/advanced (excellent) Correlating predictors Multilevel predictors –Perfect (prescient) predictor Non-causal, only works in simulation Used to measure effectiveness of other prediction schemes

10 Dynamic 1-bit Branch Prediction Basic scheme 1-bit predictor –Remembers most recent execution of branch Was it taken or not taken? –Assume same outcome next time –Where to store 1 bit? In the instruction encoding? 1 global bit (DFF) in the CPU? Visit this again later…

11 Dynamic 1-bit Branch Prediction 1-bit Predictor Example A = 0* initialize registers Loop: A = A + 1Loop:ADD $1,$1,$2 If A != 10 goto LoopBNE $1,$3, Loop Prediction Accuracy? Last iteration NT, so next time, first iteration assumes NT Result: 80% accuracy (20% mispredictions) PredictionOutcome Prediction Correct? Middle iterations TT8 correct Last iteration TNT1 wrong First iteration NTT1 wrong

12 Dynamic 2-bit Branch Prediction Two basic schemes Simple: 2-bit “saturating counter” predictor –Remember two most recent outcomes? History (prev,curr) –(T,T)  Predicts T –(NT,NT)  Predicts NT –(T,NT)  Predicts ? –(NT,T)  Predicts ? –Although a possibility, this scheme is not usually used Better: 2-bit “sequence” predictor –Mispredict twice before changing prediction

13 Dynamic 2-bit Sequence Prediction Saturating –Repeating T stays in ‘11’ state –Repeating NT stays in ‘00’ state Two-in-a-row to change prediction –(T,NT) won’t change prediction –(NT,T) won’t change prediction T 11 T 10 NT 01 NT 00 T NT T T T

14 Dynamic 2-bit Prediction Example 2-bit Predictor Example A = 0* initialize registers Loop: A = A + 1Loop:ADD $1,$1,$2 If A != 10 goto Loop:BNE $1,$3, Loop Prediction Accuracy? Last iteration is 1 st mispredict, so next time, 1 st iteration still predicts T Result: 90% accuracy (10% mispredictions) PredictionOutcome Prediction Correct? Middle iterations TT8 correct Last iteration TNT 1 wrong, but next prediction still T First iteration TT1 correct

15 Dynamic 2-bit Prediction Results Effectiveness? Mispredictions in SPEC89 with 4096-entry branch prediction table: –Nasa7: 1% –Matrix300: 0% –Tomcatv: 1% –Doduc: 5% –Spice: 9% –FPPPP: 9% –Gcc:12% –Espresso: 5% –Eqntott:18% –Li:10% About 90% effective!

16 Dynamic 2-bit Prediction Results Mispredictions in SPEC89 with N-entry branch prediction table: N=4096 N=Infinity –Nasa7: 1% 0% –Matrix300: 0% 0% –Tomcatv: 1% 0% –Doduc: 5% 5% –Spice: 9% 9% –FPPPP: 9% 9% –Gcc:12%11% –Espresso: 5% 5% –Eqntott:18%18% –Li:10%10% Still about 90% effective!

17 Dynamic N-bit Prediction Scheme We can try to generalize the 2-bit approach N-bit “saturating counter” predictor –Increment on taken branch –Decrement on untaken branch –Predictions Counter value >= (2^N)/2, predict T Counter value < (2^N)/2, predict NT N-bit “sequence” predictor –X-mispredicts-in-a-row to change –How big is X (relative to N)? Possible? Effectiveness? –Not very… 2-bit predictors good enough!

18 N-bit “Saturating Counter” Predictor T 100 NT 000 NT 001 NT 010 NT 011 T 111 T 110 T 101 NT T T T T T

19 Storing Branch History Where? In instruction memory? –Must write 1 or 2 bits into instruction, not good! Use special branch prediction table memory –Eg, 4096 entries of 2 bits each Not enough for one entry per branch instruction in your program –Or is it? –Which entry goes with which branch? Use lower bits of program counter (hash function) Some branches will use the *same* table entry Is this incorrect? No! –Some branches will be predicted with less accuracy… ie, slower program execution

20 Advanced Branch Prediction 1 Correlating Predictors –Create 8 branch prediction tables Each table may contain ~1024 entries, 2-bits of history each entry Each table is “local history” –3 global bits in CPU form “global history” Simple, small shift register Stores outcome of 3 most recently executed branches (of all branches) –Key idea “global history” determines which branch prediction table to use “local history” works like “2-bit predictor” –Called a (3,2) branch-prediction buffer Regular 2-bit predictor is a (0,2) predictor –Works better than (0,2) predictor

21 Advanced Branch Prediction 2 Multilevel Branch Prediction –Eg, Tournament Predictors: Use 2 different branch predictors per entry Choose the best between them –How to decide which is best? Use a third 2-bit predictor –Like any 2-bit predictor, eg “sequence” –This one says “use predictor 1” or “use predictor 2” Change if current predictor is wrong (but other one was right) twice in a row –Works better than Correlating Predictor

22 Predictors Summary Static –Stall –Always execute (delay slots) –Nullify-if-T (Execute-if-NT) –Nullify-if-NT (Execute-if-T) Dynamic –Nullify-if-mispredicted 2-bit, N-bit “saturating counter” predictor 2-bit “sequence” predictor (N-bit possible?) Correlating predictor –Concept of global / local history Multilevel predictor –Eg, Tournament predictor