CS5100 Advanced Computer Architecture Advanced Branch Prediction

Slides:

Advertisements

Similar presentations

Pipelining V Topics Branch prediction State machine design Systems I.

Advertisements

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.

CPE 631: Branch Prediction Electrical and Computer Engineering University of Alabama in Huntsville Aleksandar Milenkovic,

Dynamic Branch Prediction

Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.

CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.

W04S1 COMP s1 Seminar 4: Branch Prediction Slides due to David A. Patterson, 2001.

Computer Architecture 2011 – Branch Prediction 1 Computer Architecture Advanced Branch Prediction Lihu Rappoport and Adi Yoaz.

EECS 470 Branch Prediction Lecture 6 Coverage: Chapter 3.

1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 8, 2003 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 7, 2002 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)

EECC551 - Shaaban #1 lec # 5 Fall Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.

EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.

Goal: Reduce the Penalty of Control Hazards

Branch Target Buffers BPB: Tag + Prediction

EECC551 - Shaaban #1 lec # 5 Winter Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.

Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.

Branch Prediction Dimitris Karteris Rafael Pasvantidιs.

1 COMP 740: Computer Architecture and Implementation Montek Singh Thu, Feb 19, 2009 Topic: Instruction-Level Parallelism III (Dynamic Branch Prediction)

So far we have dealt with control hazards in instruction pipelines by:

Dynamic Branch Prediction

EENG449b/Savvides Lec /25/05 March 24, 2005 Prof. Andreas Savvides Spring g449b EENG 449bG/CPSC 439bG.

CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.

Evaluation of Dynamic Branch Prediction Schemes in a MIPS Pipeline Debajit Bhattacharya Ali JavadiAbhari ELE 475 Final Project 9 th May, 2012.

Lecture 3. Branch Prediction Prof. Taeweon Suh Computer Science Education Korea University COM506 Computer Design.

1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.

Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques.

Computer Structure Advanced Branch Prediction

Computer Architecture 2015 – Advanced Branch Prediction 1 Computer Architecture Advanced Branch Prediction By Yoav Etsion and Dan Tsafrir Presentation.

CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.

Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

Dynamic Branch Prediction

CSL718 : Pipelined Processors

COSC3330 Computer Architecture Lecture 14. Branch Prediction

Computer Architecture Lecture 10: Branch Prediction II

Computer Architecture: Branch Prediction (II) and Predicated Execution

Prof. Hsien-Hsin Sean Lee

COSC6385 Advanced Computer Architecture Lecture 9. Branch Prediction

CS203 – Advanced Computer Architecture

Computer Structure Advanced Branch Prediction

Dynamic Branch Prediction

COMP 740: Computer Architecture and Implementation

Computer Architecture Advanced Branch Prediction

UNIVERSITY OF MASSACHUSETTS Dept

COSC3330 Computer Architecture Lecture 15. Branch Prediction

CS 704 Advanced Computer Architecture

Samira Khan University of Virginia Nov 13, 2017

Samira Khan University of Virginia Dec 4, 2017

CMSC 611: Advanced Computer Architecture

So far we have dealt with control hazards in instruction pipelines by:

CPE 631: Branch Prediction

Branch statistics Branches occur every 4-6 instructions (16-25%) in integer programs; somewhat less frequently in scientific ones Unconditional branches.

15-740/ Computer Architecture Lecture 24: Control Flow

Dynamic Branch Prediction

Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt

Pipelining and control flow

So far we have dealt with control hazards in instruction pipelines by:

Lecture 10: Branch Prediction and Instruction Delivery

So far we have dealt with control hazards in instruction pipelines by:

So far we have dealt with control hazards in instruction pipelines by:

Adapted from the slides of Prof

Dynamic Hardware Prediction

So far we have dealt with control hazards in instruction pipelines by:

So far we have dealt with control hazards in instruction pipelines by:

So far we have dealt with control hazards in instruction pipelines by:

So far we have dealt with control hazards in instruction pipelines by:

So far we have dealt with control hazards in instruction pipelines by:

Samira Khan University of Virginia Mar 6, 2019

Computer Structure Advanced Branch Prediction

CPE 631 Lecture 12: Branch Prediction

Presentation transcript:

CS5100 Advanced Computer Architecture Advanced Branch Prediction Prof. Chung-Ta King Department of Computer Science National Tsing Hua University, Taiwan (Slides are from textbook, Prof. Hsien-Hsin Lee, Prof. Yasun Hsu, Prof. Onur Mutlu)

About This Lecture Goal: Outline: To understand the techniques for reducing the cost of branches Outline: Reducing branch cost with advanced branch prediction (Sec. 3.3) Prediction of branch direction: static, dynamic, branch correlation Prediction of branch target 1

Control Speculation with Branch Prediction Modern processors have deep pipelines Branch penalty limits performance of deep pipelines Want to execute instructions beyond a branch even before that branch is resolved  use speculative execution Branch prediction: dynamic vs. static What to predict?

What to Predict? Direction (1-bit) Target (32-bit or 64-bit addresses) Single direction for unconditional jumps and calls/returns Binary for conditional branches Target (32-bit or 64-bit addresses) Some are easy One address: uni-directional jumps Two: addresses: fall through (not taken) vs. taken Many: function pointer or indirect jump (e.g. jr r31) Ideally, one predictor for direction and one predictor for target for each branch in the code

Static Branch Prediction for Direction Uni-directional: always predict taken (or not taken) Always-not-taken: easy (does not need branch target address), not effective for loops Always-taken: branch target address needs to be computed before the instruction flow can continue (may take extra cycles) Backward taken, forward not taken Check sign of branch displacement: taken if negative, not- taken if positive  no extra hardware needed Good for, e.g., loops Do not require HW support since the sign of target displacement is already encoded in the branch instruction

Static Branch Prediction for Direction Compiler hints with branch annotation Run instrumented program with sample input data Collect info on branch direction (profiling) Use this profile info for prediction Use a bit in branch instruction Set to 1 if taken Set to 0 if un-taken Bits set by compiler or user Once set, same behavior every time

Dynamic Branch Prediction for Direction Predict branch based on past history of branch One-bit Branch History Table (BHT) PC Hash 2N entries . N bits Table update Branch History Table (BHT) Indexed by PC (or fraction of it) Each entry stores last direction that the indexed branch went (1 bit to encode taken/not-taken) Table is a cache of recent branches Buffer size of 4096 entries are common (track 4K different branches) When branch direction is resolved, go back into the table and update entry: 0 if not taken, 1 if taken BHT: a cache of recent branches Each entry stores last direction that the indexed branch went (1 bit to encode taken/not-taken) No need to decode to know if it is a branch, just look at instr. address FSM Update Logic Actual outcome Prediction

Problems with the Simple Predictor Aliasing: Two branches may be hashed to the same entry  branch prediction history is polluted Solution: make the table bigger, apply other cache optimization strategies Always mispredict twice for a loop, e.g., for (i=0; i<4; i++) { … }            Pred 1 1 1 1 1 1 1 1 1 Actual T T T T NT T T T T NT T

2-bit Counter 2-bit saturating up/down counter predictor Taken Not Taken 01/ WN 00/ SN 10/ WT 11/ ST Predict Not taken Predict taken ST: Strongly Taken WT: Weakly Taken WN: Weakly Not Taken SN: Strongly Not Taken Give inertial in responding external changes

For More Advanced Branch Prediction … Hypothesis: recent branches are correlated; that is, behavior of recently executed branches affects prediction of current branch Two possibilities: current branch depends on Local behavior: Last m outcomes of the same branch (local branch predictor), e.g., a loop of 3 iterations is executed repetitively  a history record of the loop branch of the last 6 iterations should be able to predict the direction of that branch correctly Global behavior: Last m most recently executed branches  because branches are often correlated! BHT predicts this

Branches Are Correlated! Branch direction of multiple branches Not independent but correlated to the path taken Example: path 1-1 of b3 can be known beforehand if (aa==2) // b1 aa = 0; if (bb==2) // b2 bb = 0; if (aa!=bb) {// b3 …… } b1 1 (T) 0 (NT) b2 b2 1 1 b3 b3 b3 b3 Path: A:1-1 B:1-0 C:0-1 D:0-0 aa=0 bb=0 aa=0 bb2 aa2 bb=0 aa2 bb2 How to capture global behavior?

Capturing Global Branch Correlation Idea: associate branch outcomes with global T/NT history of “all” branches Make a prediction based on outcome of the branch the last time the same global branch history was encountered Implementation: Keep track of the “global T/NT history” of all branches in a register  Global History Register (GHR) Use GHR to index into a table that records the outcome that was seen for each GHR value in the recent past  Pattern History Table (table of 2-bit counters) Global history/branch predictor Uses two levels of history (GHR + history at that GHR)

Two Level Global Branch Prediction 1st level: Global Branch History Register (N bits) The direction of last N branches 2nd level: Table of saturating counters for each history entry 00…..00 2N entries 00…..01 Branch History Register (BHR) (Shift left when update) 00…..10 Pattern History Table (PHT) Rc-k Rc-1 1 1 . . . . . 1 N Prediction 11…..10 Current state 11…..11 PHT update Branch History Pattern FSM Update Logic Actual branch outcome

How Does the Global Predictor Work? for(i=0; i<100; i++) { for(j=1; j<3; j++) { ... } // b2 } // b1 Outcome of b2 at i=6, j=3 Outcome of b1 at i=7 BHR b2 at i=7, j=1 Start with j=3, last iteration that is not taken… b2 at i=7, j=2 b2 at i=7, j=3 b1 at i=8 Branch b1 tests i & last 3 branches test j.  History: TTN Predict taken for i  Next history: TNT (shift in last outcome)

Differentiating Per Branch Behavior Two different branches may have the same global branch history but behave differently Per-addr PHTs (PPHTs) GAg GAp Addr(B) Global PHT . . . . Global BHR Global BHR ..

Capturing Local Correlation But, we still want to capture the behavior of the same branch for(i=0; i<100; i++) for(j=0; j<3; j++) { if (aa==2) aa = 0; if (bb==2) bb = 0; if (aa!=bb) {...} } Idea: have a per-branch history register Addr(B) Per-addr PHTs (PPHTs) PAp . BHT (PBHT) ..

Hybrid Branch Predictor Some branches correlated to global history, some correlated to local history Use more than one type of predictors and select “best” P0 P1 Branch PC . Final Prediction Choice (or Meta) Predictor

Tradeoff between Cost and Precision Idea: add more context infor. to the global predictor to take into account which branch is being predicted (local predictor) Gshare: GHR hashed with the Branch PC + Better utilization of PHT -- Increases access latency

Outline Prediction of branch direction: Prediction of branch target Static Dynamic Branch correlation Prediction of branch target

Prediction of Branch Targets Need target address at same time as prediction Branch Target Buffer (BTB): use PC to access I$ and simultaneously look up BTB to get prediction AND branch address (if taken) Branch PC Predicted PC PC of instruction Fetch Yes: instruction is branch and use predicted PC as next PC =? Branch predicted taken or untaken No: branch not predicted, proceed normally

How about Subroutine Returns? Different call sites make return address hard to predict printf() may be called by many callers Target of “return” instruction in printf() is a moving target But return address is actually easy to predict It is the address after the last call instruction that have not returned from yet Can use a Return Address Stack (RAS) RAS: Call will push return address on the stack Return uses the prediction of top-of-stack

Return Address Stack BTB BTB + Call PC Return PC BTB Return? 4 BTB + Push Return Address May not know if it is a return instruction prior to decoding Rely on BTB for speculation Fix once recognize Return

Outline Prediction of branch direction: Prediction of branch target Static Dynamic Branch correlation Prediction of branch target Predicated execution

Predicated Execution Idea: compiler converts control dependence into data dependence  branch is eliminated Each instr. has a predicate bit set based on the predicate computation Only instr. with TRUE predicates are committed (others become NOPs) D (normal branch code) C B A T N p1 = (cond) branch p1, TARGET mov b, 1 jmp JOIN TARGET: mov b, 0 add x, b, 1 D B C A (predicated code) p1 = (cond) (!p1) mov b, 1 (p1) mov b, 0 add x, b, 1 if (cond) { b = 0; } else { b = 1;

Conditional Move Operations Very limited form of predicated execution CMOV R1  R2 R1 = (ConditionCode == true) ? R2 : R1 Employed in most modern ISAs (x86, Alpha) if (a == 5) {b = 4;} else {b = 3;} CMPEQ condition, a, 5; CMOV condition, b  4; CMOV !condition, b  3;

Recap Branch History Table: 2 bits for loop accuracy Correlation: recently executed branches correlated with next branch. Either different branches Or different executions of same branches 2-level predictor Branch history and pattern history Branch Target Buffer: include branch address and prediction Return address stack for return address of calls