EECE476: Computer Architecture Lecture 21: Faster Branches Branch Prediction with Branch-Target Buffers (not in textbook) The University of British ColumbiaEECE.

Slides:



Advertisements
Similar presentations
Morgan Kaufmann Publishers The Processor
Advertisements

1 Lecture 11: Modern Superscalar Processor Models Generic Superscalar Models, Issue Queue-based Pipeline, Multiple-Issue Design.
1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
Pipelining and Control Hazards Oct
Lecture Objectives: 1)Define branch prediction. 2)Draw a state machine for a 2 bit branch prediction scheme 3)Explain the impact on the compiler of branch.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Dynamic Branch Prediction
Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.
Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
Computer Architecture Computer Architecture Processing of control transfer instructions, part I Ola Flygt Växjö University
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
EECE476: Computer Architecture Lecture 23: Speculative Execution, Dynamic Superscalar (text 6.8 plus more) The University of British ColumbiaEECE 476©
EECE476: Computer Architecture Lecture 22: Zero-cycle Branches (no text) Superpipelining (no text) vs. Superscalar (text 6.8) The University of British.
EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.
MIPS Pipeline Default behaviour and pipeline organization The University of British ColumbiaEECE 476© 2005 Guy Lemieux.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipeline Hazards See: P&H Chapter 4.7.
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
EECS 470 Branch Prediction Lecture 6 Coverage: Chapter 3.
EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
EECE476: Computer Architecture Lecture 20: Branch Prediction Chapter extra The University of British ColumbiaEECE 476© 2005 Guy Lemieux.
Chapter 12 Pipelining Strategies Performance Hazards.
EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Computer Architecture 2011 – out-of-order execution (lec 7) 1 Computer Architecture Out-of-order execution By Dan Tsafrir, 11/4/2011 Presentation based.
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
EECE476: Computer Architecture Lecture 19: Pipelining Reducing Control Hazard Penalty Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy.
Goal: Reduce the Penalty of Control Hazards
Branch Target Buffers BPB: Tag + Prediction
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
COMP381 by M. Hamdi 1 (Recap) Control Hazards. COMP381 by M. Hamdi 2 Control (Branch) Hazard A: beqz r2, label B: label: P: Problem: The outcome.
Dynamic Branch Prediction
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Arvind and Joel Emer Computer Science and Artificial Intelligence Laboratory M.I.T. Branch Prediction.
Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
EEL5708 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Pipelining.
Pipeline Hazards. CS5513 Fall Pipeline Hazards Situations that prevent the next instructions in the instruction stream from executing during its.
CMPE 421 Parallel Computer Architecture
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Pipelining Basics.
Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques.
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
Superscalar - summary Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store Requires complex.
Branch Hazards and Static Branch Prediction Techniques
CSC 4250 Computer Architectures October 31, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation.
HazardsCS510 Computer Architectures Lecture Lecture 7 Pipeline Hazards.
CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
CMPE 421 Parallel Computer Architecture Part 3: Hardware Solution: Control Hazard and Prediction.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
CS203 – Advanced Computer Architecture Pipelining Review.
1 Lecture 10: Memory Dependence Detection and Speculation Memory correctness, dynamic memory disambiguation, speculative disambiguation, Alpha Example.
Stalling delays the entire pipeline
CS203 – Advanced Computer Architecture
Test 2 review Lectures 5-10.
Pipeline Implementation (4.6)
Chapter 4 The Processor Part 4
Test 2 review Lectures 5-10.
TIME C1 C2 C3 C4 C5 C6 C7 C8 C9 I1 branch decode exec mem wb bubble
15-740/ Computer Architecture Lecture 5: Precise Exceptions
Control unit extension for data hazards
Lecture 10: Branch Prediction and Instruction Delivery
CS203 – Advanced Computer Architecture
Pipelining (II).
Control unit extension for data hazards
Control unit extension for data hazards
Presentation transcript:

EECE476: Computer Architecture Lecture 21: Faster Branches Branch Prediction with Branch-Target Buffers (not in textbook) The University of British ColumbiaEECE 476© 2005 Guy Lemieux

2 Branch Prediction: Fetching Early branch decision in “D” stage –Still 1 cycle branch penalty Can we do better? Branch prediction, three steps: –1. Predict direction –2. Fetch from predicted direction –3. Nullify if mispredicted We’ve discussed steps 1 and 3, but not 2

3 Branch Prediction Problems How does branch prediction “fetch from predicted direction”? Pipeline cannot tell if instruction is a branch until “D” stage –At end of stage “D”, we know branch target and branch outcome –At this point, already too late to fill “I” stage New instruction already in “I” stage –How did we fill this “slot”? –Predict not-taken: fetch from PC+4 Works ok –Predict taken: fetch from PC+4+Imm16 BR instruction (Imm16) only known in “D” We can’t compute the branch target in “I” Must determine predicted-PC in “I” stage –AT THE SAME TIME as we are fetching the BR instruction itself –No time left to compute branch-target (PC Imm16)

4 BPB + BTB Overview Prediction used to select new PC value Branch- Target Buffer Adder (+4) Branch- Prediction Buffer PC Instr Mem Instruction

5 Beyond Branch Prediction: Branch-Target Buffer (BTB) Problem: determine predicted-PC for branches in “I” stage Solution: –Branch-Target Buffer, similar to a cache –Stores precomputed “target address” for branch instructions Do not confuse this with a Branch-Prediction Buffer (BPB) Lots of bits (eg, 32 bits for target address) compared to BPB Approach –As normal: use PC to access InstrMem and fetch next instruction –In parallel: Use PC to access Branch-Target Buffer (BTB), gives us TARGET Use PC to access Branch-Prediction Buffer (BPB), gives us PREDICTION Compute PC+4 –Next: PREDICTION selects (PC+4) or (TARGET) for PC

6 BTB Problem: Non-branches What if instruction fetched isn’t a branch? –Must always choose PC+4 –Must ignore branch predictor output

7 BTB with Non-Branches Not all instructions are branches… Branch- Target Buffer Adder (+4) Branch- Prediction Buffer PC InstrMem Instruction 0 1 Branch?

8 BPB: Correctness on Misprediction Plain Vanilla: Branch-Prediction Buffer Only –Small size (say, 128 entries) –Uses only lower bits of PC for address –Multiple branches may map to same entry –Execution ALWAYS CORRECT even if prediction is wrong! Misprediction always corrected via nullify

9 BTB: Correctness on Misprediction BPB + Branch-Target Buffer –Necessarily: small size, uses only lower bits of PC for address –Multiple branches may map to same entry –If prediction wrong, the wrong PC is used in next cycle Must correct misprediction via nullify Must also correct the PC !!!! –Even if prediction correct, may still use wrong PC in next cycle Stored TARGET address may be wrong –Eg, stale data, or stored target is for different BR instruction Must ensure that correct PC is used (DETECT) If incorrect, must nullify instruction and restore PC (CORRECT)

10 Fixing BTB Correctness Problems Correctness Problem Summary 2 Cases: –Case 1: BR mispredicted Must nullify the mispredicted instruction Must correct PC –Case 2: BR predicted OK Must DETECT if correct PC was used (was target OK?) Case 2A: Correct TARGET –No problems, continue on… Case 2B: Incorrect TARGET –Must CORRECT PC to proper value and –Must nullify the fetched/predicted instruction

11 Fixing BTB: Case 1 Case 1: BR mispredicted –Misprediction known in “D” stage –Must nullify the mispredicted instruction in “I” Easy… we already know how to do this –Must correct PC Wrong PC in “I” stage, cannot use Take Correct PC from regular branch calculation in “D” stage

12 Correct new PC on misprediction (BEQ in D) Not shown: nullifying instruction in “I” stage Adder (+4) PC InstrMem Branch? I/DD/X RegFile Imm16 Adder == Mispredict Branch? Branch- Prediction Buffer Branch- Target Buffer Fixing BTB: Case 1 Branch? Correct PC T/NT Mispredicted? Prediction? EQ?

13 Fixing BTB: Case 2 Case 2: BR predicted OK –Must DETECT if proper PC was used (was target OK?) –If wrong, must CORRECT PC to proper value and nullify the wrongly fetched/predicted instruction in I How to fix? –DETECT VALID TARGET Ensure only correct target is stored in BTB entry –Tag each entry with the entire PC of the branch instruction »REQUIRES MORE STORAGE BITS IN BTB !!! –Compare stored-tag to actual PC –IF TAG MATCH, permit branch prediction –IF TAG MISMATCH, do something safe »Safe ideas: insert bubble into “I” stage »OR fetch PC+4 and nullify-if-taken

14 Fixing BTB: Case 2 Case 2B: TAG MISMATCH, when is it detected? Option 2B-1: DETECT TAG MISMATCH early, in “I” stage –Allows immediate correction by safe actions Benefit: more likely to fetch a useful instruction after a branch –Adds compare logic after BTB Penalty: may increase cycle time Option 2B-2: DETECT TAG MISMATCH late, in “D” stage –Adds compare logic after BTB in “D” stage Benefit: keeps cycle time fast –Must CORRECT action already taken in “I” stage Must nullify instruction in “I” stage Restore correct PC by taking it from the branch calculation in “D” stage Easy  we already do this when we mispredict!

15 Fixing BTB: Cases BTB adds TAG bits to store PC for each entry Adder (+4) PC InstrMem Branch? I/DD/X RegFile Imm16 Adder == Mispredict Branch? PC/TAG Mismatch? 1 0 == Branch- Prediction Buffer Branch- Target Buffer TAG Bits Branch? Correct PC T/NT Mispredicted? Mismatch? Prediction? EQ?

16 Updating BTB After every branch –Update prediction bits in BPB –Update target in BTB BPB is a true dual-port memory –Every cycle, read Branch-Predict bits Read address: low bits of PC being used in “I” stage –Every cycle, (maybe) write new Branch-Predict bits Data: depends on previous Predict-state + BR outcome in “D” stage Write address: low bits of PC of BR instruction in “D” stage When to write: if BR executed in “D” BTB is a true dual-port memory –Every cycle, read TAG,Target bits Read address: low bits of PC being used in “I” stage –Every cycle, (maybe) write new TAG,Target bits Write address: low bits of PC of BR instruction in “D” stage When to write: if BR in “D” and BR is TAKEN (no need to store not-taken entries)

17 Adder (+4) PC InstrMem Branch? I/DD/X RegFile Imm16 Adder == Mispredict Branch? PC/TAG Mismatch? 1 0 == Branch? Branch- Prediction Buffer Updating BTB: Required Data Branch- Target Buffer (Incl. Tag) TARGET to store in Tag (data) PC of Branch (addr) Correct PC Mispredicted? Mismatch? Prediction? EQ? Prediction? T / NT ? Branch?

18 Other Stuff Zero-cycle jumps/unconditional branches Zero-cycle conditional branches (on CC) Why is branch prediction is so important? –R4000 pipeline: 8 stages –Superscalar Execution –Dynamic Pipelining –Register Renaming –Speculative Execution Learn all of this soon! Come to next class! –Not covered in textbook!