CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.

Slides:



Advertisements
Similar presentations
Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.
Advertisements

Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.
CPE 631: Branch Prediction Electrical and Computer Engineering University of Alabama in Huntsville Aleksandar Milenkovic,
Dynamic Branch Prediction
Copyright 2001 UCB & Morgan Kaufmann ECE668.1 Adapted from Patterson, Katz and Culler © UCB Csaba Andras Moritz UNIVERSITY OF MASSACHUSETTS Dept. of Electrical.
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
CPE 631: Branch Prediction Electrical and Computer Engineering University of Alabama in Huntsville Aleksandar Milenkovic,
1 Lecture: Branch Prediction Topics: branch prediction, bimodal/global/local/tournament predictors, branch target buffer (Section 3.3, notes on class webpage)
W04S1 COMP s1 Seminar 4: Branch Prediction Slides due to David A. Patterson, 2001.
Lecture 3: Branch Prediction Young Cho Graduate Computer Architecture I.
EECC551 - Shaaban #1 lec # 5 Spring Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.
EECC551 - Shaaban #1 lec # 5 Fall Static Conditional Branch Prediction Branch prediction schemes can be classified into static (at compilation.
EECS 470 Branch Prediction Lecture 6 Coverage: Chapter 3.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 8, 2003 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 7, 2002 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
EECE476: Computer Architecture Lecture 20: Branch Prediction Chapter extra The University of British ColumbiaEECE 476© 2005 Guy Lemieux.
EECC551 - Shaaban #1 lec # 5 Fall Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.
EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )
1 Lecture 5 Branch Prediction (2.3) and Scoreboarding (A.7)
EECC551 - Shaaban #1 lec # 7 Fall Hardware Dynamic Branch Prediction Simplest method: –A branch prediction buffer or Branch History Table.
Goal: Reduce the Penalty of Control Hazards
Branch Target Buffers BPB: Tag + Prediction
EECC551 - Shaaban #1 lec # 5 Winter Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
Branch Prediction Dimitris Karteris Rafael Pasvantidιs.
1 COMP 740: Computer Architecture and Implementation Montek Singh Thu, Feb 19, 2009 Topic: Instruction-Level Parallelism III (Dynamic Branch Prediction)
So far we have dealt with control hazards in instruction pipelines by:
Dynamic Branch Prediction
EENG449b/Savvides Lec /25/05 March 24, 2005 Prof. Andreas Savvides Spring g449b EENG 449bG/CPSC 439bG.
1 Lecture 7: Branch prediction Topics: bimodal, global, local branch prediction (Sections )
ENGS 116 Lecture 91 Dynamic Branch Prediction and Speculation Vincent H. Berk October 10, 2005 Reading for today: Chapter 3.2 – 3.6 Reading for Wednesday:
Branch Prediction High-Performance Computer Architecture Joe Crop Oregon State University School of Electrical Engineering and Computer Science.
1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.
CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.
Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques.
CSC 4250 Computer Architectures October 31, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation.
CPE 631 Session 17 Branch Prediction Electrical and Computer Engineering University of Alabama in Huntsville.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
Copyright 2016 Csaba Andras MoritzECE668 Power Aware Branching.1 Few slides adapted from Patterson, et al © UCB and Morgan Kaufmann Csaba Andras Moritz.
Yiorgos Makris Professor Department of Electrical Engineering University of Texas at Dallas EE (CE) 6304 Computer Architecture Lecture #13 (10/28/15) Course.
Dynamic Branch Prediction
CS203 – Advanced Computer Architecture
Computer Structure Advanced Branch Prediction
Dynamic Branch Prediction
COMP 740: Computer Architecture and Implementation
Computer Architecture Advanced Branch Prediction
UNIVERSITY OF MASSACHUSETTS Dept
COSC3330 Computer Architecture Lecture 15. Branch Prediction
CS 704 Advanced Computer Architecture
CMSC 611: Advanced Computer Architecture
So far we have dealt with control hazards in instruction pipelines by:
Dynamic Hardware Branch Prediction
CPE 631: Branch Prediction
Branch statistics Branches occur every 4-6 instructions (16-25%) in integer programs; somewhat less frequently in scientific ones Unconditional branches.
Dynamic Branch Prediction
/ Computer Architecture and Design
So far we have dealt with control hazards in instruction pipelines by:
Lecture 10: Branch Prediction and Instruction Delivery
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
Adapted from the slides of Prof
Dynamic Hardware Prediction
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
CPE 631 Lecture 12: Branch Prediction
Presentation transcript:

CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue

CIS 429/529 Winter 2007 Branch Prediction.2 Four Branch Prediction Schemes 1-bit Branch-Prediction Buffer 2-bit Branch-Prediction Buffer Correlating Branch Prediction Buffer Tournament Branch Predictor

CIS 429/529 Winter 2007 Branch Prediction.3 1-bit Branch Prediction Branch History Table/Buffer: –Indexed by low order bits of branch instruction address –Table of 1-bit values –Says whether branch taken (1) or not taken (0) last time –Use current value to predict; modify value if prediction turns out to be wrong. Problems: in a loop, 1-bit BHT will cause 2 mispredictions: –End of loop case, when it exits instead of looping as before –First time through loop on next time through code, when it predicts exit instead of looping –Assuming 10 times thru loop (typical value) only 80% accuracy even if loop 90% of the time –No address check (may not be right branch stmt)

CIS 429/529 Winter 2007 Branch Prediction.4 Two bits of history -- must mispredict twice before changing the prediction Uses a 2-bit saturating counter 2-Bit Branch Prediction (Jim Smith, 1981) increment by 1 decrement by 1

CIS 429/529 Winter 2007 Branch Prediction.5 RED: Predict not taken if counter < 2 (half of max = 4) GREEN: Predict taken if counter >= 2 2-Bit Branch Prediction (cont.) taken not taken Counter value is a history of last two branch outcomes

CIS 429/529 Winter 2007 Branch Prediction.6 2-bit Branch Prediction Branch History Table/Buffer: –Indexed by low order bits of branch instruction address –Table of 2-bit values –Records history of last two branch outcomes –Use counter value to predict (NT if = 2) Performance –Midprediction rate of 1 to 18% with 4K BHT –Requires ore overhead than 1-bit (update history bit every time through the loop) –Note: error in Figure 3.7

CIS 429/529 Winter 2007 Branch Prediction.7 N-bit Branch Prediction Generalization of 2-bit predictor using an n- bit saturating counter Predict not taken if counter < 2^(n-1) Predict taken if counter >= 2^(n-1)

CIS 429/529 Winter 2007 Branch Prediction.8 Accuracy of 2-bit Prediction

CIS 429/529 Winter 2007 Branch Prediction.9 Accuracy of 2-bit Prediction

CIS 429/529 Winter 2007 Branch Prediction.10 (2,2) Correlating Branches Idea: use two pieces of information: (1) global history of all recently executed branches is related to behavior of next branch and (2) local history of that specific branch (2,2) predictor: 2-bit global, 2-bit local Branch address (4 bits) 2-bits per branch local predictors Prediction 2-bit global branch history (01 = not taken then taken)

CIS 429/529 Winter 2007 Branch Prediction.11 Generalization: (m,n) Correlating Branches m-bit global history n-bit local predictor which uses an n-bit saturating counter Size of BHT: 2^k * 2^m * n Branch address (k bits) n-bits per branch local predictors Prediction m-bit global branch history 2^k rows 2^m columns

CIS 429/529 Winter 2007 Branch Prediction.12 Accuracy of Different Schemes 4096 Entries 2-bit BHT Unlimited Entries 2-bit BHT 1024 Entries (2,2) BHT 0% 18% Frequency of Mispredictions

CIS 429/529 Winter 2007 Branch Prediction.13 Summary: BHT Accuracy Mispredict because either: –Wrong guess for that branch –Got branch history of wrong branch when index the table 4096 entry table programs vary from 1% misprediction (nasa7, tomcatv) to 18% (eqntott), with spice at 9% and gcc at 12% For SPEC92, 4096 about as good as infinite table 2-bit or (2,2) correlating predictors good

CIS 429/529 Winter 2007 Branch Prediction.14 Tournament Predictors Use 2 predictors, 1 based on global information and 1 based on local information, and choose using a selector Hopes to select right predictor for right branch

CIS 429/529 Winter 2007 Branch Prediction.15 Tournament Predictor in Alpha Global predictor: 4K entries and is indexed by the history of the last 12 branches; each entry in the global predictor is a standard 2-bit predictor –12-bit pattern: ith bit 0 => ith prior branch not taken; ith bit 1 => ith prior branch taken; Local predictor consists of a 2-level predictor: –Top level a local history table consisting of bit entries; each 10-bit entry corresponds to the most recent 10 branch outcomes for the entry. 10-bit history allows patterns 10 branches to be discovered and predicted. –Next level Selected entry from the local history table is used to index a table of 1K entries consisting a 3-bit saturating counters, which provide the local prediction Total size: 4K*2 + 4K*2 + 1K*10 + 1K*3 = 29K bits! (~180,000 transistors)

CIS 429/529 Winter 2007 Branch Prediction.16 Tournament Predictor in Alpha Selector: 4K 2-bit counters to choose between a global predictor and a local predictor Transition labels indicate outcome of last predictions: P1 / P2 where 0 = wrong 1 = right E.g. 1/0 means P1 predicted right, P2 predicted wrong

CIS 429/529 Winter 2007 Branch Prediction.17 % of predictions from local predictor in Tournament Prediction Scheme

CIS 429/529 Winter 2007 Branch Prediction.18 Accuracy of Branch Prediction Profile: branch profile from last execution

CIS 429/529 Winter 2007 Branch Prediction.19 Accuracy v. Size (SPEC89)

CIS 429/529 Winter 2007 Branch Prediction.20 Recap: Branch Prediction Methods 1-bit 2-bit Correlating Tournament Each is a subcase of the following, with tournament the most general.

CIS 429/529 Winter 2007 Branch Prediction.21 New Techniques to Reduce Branch Penalty Branch Target Buffer Integrated Instruction Fetch Units Return Address Predictors

CIS 429/529 Winter 2007 Branch Prediction.22 Branch Target Buffer: get target address during IF Branch Target Buffer (BTB): Address of branch index to get prediction AND branch address (if taken) –Note: must check for branch match now, since can’t use wrong branch address ( Figure 3.19, p. 262 ) Branch PCPredicted PC =? PC of instruction FETCH Extra prediction state bits Yes: instruction is branch and use predicted PC as next PC No: branch not predicted, proceed normally (Next PC = PC+4)

CIS 429/529 Winter 2007 Branch Prediction.23 Special Case Return Addresses Register Indirect branch hard to predict address SPEC89 85% such branches for procedure return Since stack discipline for procedures, save return address in small buffer that acts like a stack: 8 to 16 entries has small miss rate

CIS 429/529 Winter 2007 Branch Prediction.24 Integrated IF Units Integrated branch prediction: branch prediction built into IF Instruction prefetch Instruction memory access and buffering We shall see that pipelining and caching interact closely.

CIS 429/529 Winter 2007 Branch Prediction.25 Pitfall: Sometimes bigger and dumber is better uses tournament predictor (29 Kbits) Earlier uses a simple 2-bit predictor with 2K entries (or a total of 4 Kbits) SPEC95 benchmarks, outperforms –21264 avg mispredictions per 1000 instructions –21164 avg mispredictions per 1000 instructions Reversed for transaction processing (TP) ! –21264 avg. 17 mispredictions per 1000 instructions –21164 avg. 15 mispredictions per 1000 instructions TP code much larger & hold 2X branch predictions based on local behavior (2K vs. 1K local predictor in the 21264)