Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques.

Slides:



Advertisements
Similar presentations
Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.
Advertisements

Pipelining and Control Hazards Oct
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture VLIW Steve Ko Computer Sciences and Engineering University at Buffalo.
CPE 631: Branch Prediction Electrical and Computer Engineering University of Alabama in Huntsville Aleksandar Milenkovic,
Dynamic Branch Prediction
Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.
Lecture 8: More ILP stuff Professor Alvin R. Lebeck Computer Science 220 Fall 2001.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture ILP II Steve Ko Computer Sciences and Engineering University at Buffalo.
Copyright 2001 UCB & Morgan Kaufmann ECE668.1 Adapted from Patterson, Katz and Culler © UCB Csaba Andras Moritz UNIVERSITY OF MASSACHUSETTS Dept. of Electrical.
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
CPE 631: Branch Prediction Electrical and Computer Engineering University of Alabama in Huntsville Aleksandar Milenkovic,
W04S1 COMP s1 Seminar 4: Branch Prediction Slides due to David A. Patterson, 2001.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture ILP III Steve Ko Computer Sciences and Engineering University at Buffalo.
March 11, 2010CS152, Spring 2010 CS 152 Computer Architecture and Engineering Lecture 14 - Advanced Superscalars Krste Asanovic Electrical Engineering.
CS 152 Computer Architecture and Engineering Lecture 14 - Advanced Superscalars Krste Asanovic Electrical Engineering and Computer Sciences University.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 8, 2003 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 7, 2002 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)
EECC551 - Shaaban #1 lec # 5 Fall Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.
EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Goal: Reduce the Penalty of Control Hazards
Trace Caches J. Nelson Amaral. Difficulties to Instruction Fetching Where to fetch the next instruction from? – Use branch prediction Sometimes there.
Branch Target Buffers BPB: Tag + Prediction
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
1 COMP 740: Computer Architecture and Implementation Montek Singh Thu, Feb 19, 2009 Topic: Instruction-Level Parallelism III (Dynamic Branch Prediction)
Dynamic Branch Prediction
EENG449b/Savvides Lec /25/05 March 24, 2005 Prof. Andreas Savvides Spring g449b EENG 449bG/CPSC 439bG.
CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.
Spring 2003CSE P5481 Control Hazard Review The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction.
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
ENGS 116 Lecture 91 Dynamic Branch Prediction and Speculation Vincent H. Berk October 10, 2005 Reading for today: Chapter 3.2 – 3.6 Reading for Wednesday:
CPSC614 Lec 5.1 Instruction Level Parallelism and Dynamic Execution #4: Based on lectures by Prof. David A. Patterson E. J. Kim.
CS252 Graduate Computer Architecture Lecture 8 Explicit Renaming (con’t) Prediction (Branches, Return Addrs) February 17 th, 2010 John Kubiatowicz Electrical.
Arvind and Joel Emer Computer Science and Artificial Intelligence Laboratory M.I.T. Branch Prediction.
Evaluation of Dynamic Branch Prediction Schemes in a MIPS Pipeline Debajit Bhattacharya Ali JavadiAbhari ELE 475 Final Project 9 th May, 2012.
© Krste Asanovic, 2014CS252, Spring 2014, Lecture 7 CS252 Graduate Computer Architecture Spring 2014 Lecture 7: Branch Prediction and Load-Store Queues.
CS252 Graduate Computer Architecture Lecture 8 Explicit Renaming (con’t) Prediction (Branches, Return Addrs) John Kubiatowicz Electrical Engineering and.
1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
© Krste Asanovic, 2015CS252, Fall 2015, Lecture 7 CS252 Graduate Computer Architecture Spring 2014 Lecture 7: Advanced Out-of-Order Superscalar Designs.
Out-of-Order Execution & Register Renaming Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology Asanovic/Devadas Spring.
Superscalar - summary Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store Requires complex.
Yiorgos Makris Professor Department of Electrical Engineering University of Texas at Dallas EE (CE) 6304 Computer Architecture Lecture #12 (10/27/15) Course.
CPE 631 Session 17 Branch Prediction Electrical and Computer Engineering University of Alabama in Huntsville.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
COMPSYS 304 Computer Architecture Speculation & Branching Morning visitors - Paradise Bay, Bay of Islands.
Yiorgos Makris Professor Department of Electrical Engineering University of Texas at Dallas EE (CE) 6304 Computer Architecture Lecture #13 (10/28/15) Course.
Dynamic Branch Prediction
Computer Organization CS224
CS203 – Advanced Computer Architecture
Computer Architecture Advanced Branch Prediction
UNIVERSITY OF MASSACHUSETTS Dept
PowerPC 604 Superscalar Microprocessor
CS252 Graduate Computer Architecture Spring 2014 Lecture 8: Advanced Out-of-Order Superscalar Designs Part-II Krste Asanovic
Module 3: Branch Prediction
CS252 Graduate Computer Architecture Lecture 8 Prediction/Speculation (Branches, Return Addrs) February 14th, 2011 John Kubiatowicz Electrical Engineering.
CPE 631: Branch Prediction
Branch statistics Branches occur every 4-6 instructions (16-25%) in integer programs; somewhat less frequently in scientific ones Unconditional branches.
CS 152 Computer Architecture and Engineering Lecture 13 - Out-of-Order Issue, Register Renaming, & Branch Prediction Krste Asanovic Electrical Engineering.
Krste Asanovic Electrical Engineering and Computer Sciences
Dynamic Branch Prediction
Branch Prediction: Direction Predictors
/ Computer Architecture and Design
Control unit extension for data hazards
Lecture 10: Branch Prediction and Instruction Delivery
Branch Prediction: Direction Predictors
Adapted from the slides of Prof
Dynamic Hardware Prediction
CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 12 – Branch Prediction and Advanced Out-of-Order Superscalars.
CPE 631 Lecture 12: Branch Prediction
Presentation transcript:

branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques

branch.2 10/14 I-cache Fetch Buffer Issue Buffer Func. Units Arch. State Execute Decode Result Buffer Commit PC Fetch Branch executed Next fetch started Modern processors have pipeline stages between next PC calculation and branch resolution ! Control Flow Penalty Why Branch Prediction work lost if pipeline makes wrong prediction ~ Loop length x pipeline width

branch.3 10/14 Branch Penalties in a Superscalar are extensive

branch.4 10/14 Reducing Control Flow Penalty Software solutions Minimize branches - loop unrolling Increases the run length Hardware solutions Find something else to do - delay slots Speculate –Dynamic branch prediction Speculative execution of instructions beyond branch

branch.5 10/14 Motivation: Branch penalties limit performance of deeply pipelined processors Much worse for superscalar processors Modern branch predictors have high accuracy (>95%) and can reduce branch penalties significantly Required hardware support: Dynamic Prediction HW: Branch history tables, branch target buffers, etc. Mispredict recovery mechanisms: Keep computation result separate from commit Kill instructions following branch Restore state to state following branch Branch Prediction

branch.6 10/14 Static Branch Prediction- review Overall probability a branch is taken is ~60-70% but: ISA can attach preferred direction semantics to branches, e.g., Motorola MC88110 bne0 (preferred taken) beq0 (not taken) ISA can allow arbitrary choice of statically predicted direction, e.g., HP PA-RISC, Intel IA-64 typically reported as ~80% accurate JZ backward 90% forward 50%

branch.7 10/14 Branch Prediction Needs Target address generation –Get register: PC, Link reg, GP reg. –Calculate: +/- offset, auto inc/dec –Target speculation Condition resolution –Get register: condition code reg, count reg., other reg. –Compare registers –Condition speculation

branch.8 10/14 Target address generation takes time

branch.9 10/14 Condition resolution takes time

branch.10 10/14 Solution: Branch speculation

branch.11 10/14 Branch Prediction Schemes 1.2-bit Branch-Prediction Buffer 2.Branch Target Buffer 3.Correlating Branch Prediction Buffer 4.Tournament Branch Predictor 5.Integrated Instruction Fetch Units 6.Return Address Predictors (for subroutines, Pentium, Core Duo) 7.Predicated Execution (Itanium)

branch.12 10/14 Dynamic Branch Prediction learning based on past behavior Incoming stream of addresses Fast outgoing stream of predictions Correction information returned from pipeline Branch Predictor Incoming Branches { Address } Prediction { Address, Value } Corrections { Address, Value } History Information

branch.13 10/14 Branch History Table (BHT) Table of predictors Each branch given its own predictor BHT is table of “Predictors” –Could be 1-bit or more –Indexed by PC address of Branch Problem: in a loop, 1-bit BHT will cause two mispredictions (avg is 9 iterations before exit): –End of loop case: when it exits loop –First time through loop, it predicts exit instead of looping most schemes use at least 2 bit predictors Performance = ƒ(accuracy, cost of misprediction) –Misprediction  Flush Reorder Buffer In Fetch state of branch: –Use Predictor to make prediction When branch completes –Update corresponding Predictor Predictor 0 Predictor 7 Predictor 1 Branch PC

branch.14 10/14 Branch History Table Organization Target PC calculation takes time 4K-entry BHT, 2 bits/entry, ~80-90% correct predictions 00 Fetch PC Branch? Target PC + I-Cache Opcodeoffset Instruction k BHT Index 2 k -entry BHT, 2 bits/entry Taken/¬Taken?

branch.15 10/14 Better Solution: 2-bit scheme where change prediction only if get misprediction twice: Red: stop, not taken Green: go, taken Adds hysteresis to decision making process 2-bit Dynamic Branch Prediction more accurate than 1-bit T T NT Predict Taken Predict Not Taken Predict Taken Predict Not Taken T NT T

branch.16 10/14 BTB: Branch Address at Same Time as Prediction Branch Target Buffer (BTB): Address of branch index to get prediction AND branch address (if taken) Branch PCPredicted PC =? PC of instruction FETCH prediction state bits Yes: instruction is branch and use predicted PC as next PC No: branch not predicted, proceed normally (Next PC = PC+4) Only predicted taken branches and jumps held in BTB Next PC determined before branch fetched and decoded later: check prediction, if wrong kill instruction, update BPb

branch.17 10/14 BTB contains only Branch & Jump Instructions BTB contains information for branch and jump instructions only  not updated for other instructions For all other instructions the next PC is PC+4 ! Achieved without decoding instruction

branch.18 10/14 Combining BTB and BHT BTB entries considerably more expensive than BHT, fetch redirected earlier in pipeline - can accelerate indirect branches (JR) BHT can hold many more entries - more accurate A PC Generation/Mux P Instruction Fetch Stage 1 F Instruction Fetch Stage 2 B Branch Address Calc/Begin Decode I Complete Decode J Steer Instructions to Functional units R Register File Read E Integer Execute BTB BHT BHT in later pipeline stage corrects when BTB misses a predicted taken branch BTB/BHT only updated after branch resolves in E stage

branch.19 10/14 Subroutine Return Stack Small stack – accelerate subroutine returns more accurate than BTBs. Push return address when function call executed Pop return address when subroutine return decoded &nexta &nextb &nextc k entries (typically k=8-16)

branch.20 10/14 Mispredict Recovery In-order execution machines: –Instructions issued after branch cannot write-back before branch resolves –all instructions in pipeline behind mispredicted branch Killed

branch.21 10/14 Avoid branch prediction by turning branches into conditionally executed instructions: if (x) then A = B op C else NOP –If false, then neither store result nor cause exception –Expanded ISA of Alpha, MIPS, PowerPC, SPARC have conditional move; PA-RISC can annul any following instr. –IA-64: 64 1-bit condition fields selected so conditional execution of any instruction –This transformation is called “if-conversion” Drawbacks to conditional instructions –Still takes a clock even if “annulled” –Stall if condition evaluated late –Complex conditions reduce effectiveness; condition becomes known late in pipeline x A = B op C Predicated Execution

branch.22 10/14 Accuracy v. Size (SPEC89)

branch.23 10/14 Dynamic Branch Prediction Summary Prediction becoming important part of scalar execution Branch History Table: 2 bits for loop accuracy Correlation: Recently executed branches correlated with next branch. Tournament Predictor: more resources to competitive solutions and pick between them Branch Target Buffer: include branch address & prediction Predicated Execution can reduce number of branches, number of mispredicted branches Return address stack for prediction of indirect jump