Presentation is loading. Please wait.

Presentation is loading. Please wait.

EENG449b/Savvides Lec 10.1 2/17/04 February 17, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG.

Similar presentations


Presentation on theme: "EENG449b/Savvides Lec 10.1 2/17/04 February 17, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG."— Presentation transcript:

1 EENG449b/Savvides Lec 10.1 2/17/04 February 17, 2004 Prof. Andreas Savvides Spring 2004 http://www.eng.yale.edu/courses/eeng449bG EENG 449bG/CPSC 439bG Computer Systems Lecture 11 Instruction Level Parallelism II

2 EENG449b/Savvides Lec 10.2 2/17/04 Announcements Midterm Next Thursday 02/19/04 TA extra office hour –Sobeeh will have an extra office hour tomorrow –Office hours 5:00 – 7:00pm, AKW 201 Reading for this lecture: Chapter 3 pages 196 - 216

3 EENG449b/Savvides Lec 10.3 2/17/04 Dynamic Hardware Prediction Last time: Tomasulo’s Algorithm for ILP –Dynamic scheduling –Register renaming –Dynamic memory disambiguation »Avoid conflicts in load and store instructions –Tomasulo’s algorithm deals with data dependences Today: Dynamic branch prediction –Deal with control dependences –Control dependences become the limiting factor in ILP optimizations »Remember from last lecture – basic block sizes between 4 – 7 instructions….

4 EENG449b/Savvides Lec 10.4 2/17/04 Predicting Branches In Appendix A: static techniques –Delay slot execution –Action taken does not depend on the dynamic behavior of a branch Dynamic branch prediction –Try to predict the outcome of a branch early on in order to avoid stalls –Branch prediction is critical for multiple issue processors »In an n-issue processor, branches will come n times faster than a single issue processor

5 EENG449b/Savvides Lec 10.5 2/17/04 Branch Prediction Metrics To evaluate the effectiveness of branch prediction you need to consider –Prediction accuracy –Penalties associated with branch taken and branch not taken –The associated penalties are artifacts of »Pipeline design »Type of predictor »Branch frequency »Strategy to deal with the misprediction

6 EENG449b/Savvides Lec 10.6 2/17/04 Basic Branch Predictor Use a 1-bit branch predictor buffer or branch history table 1 bit of memory stating whether the branch was recently taken or not –Indexed by the lower portion of the branch predict instruction Bit entry updated each time the branch instruction is executed Problem with 1-bit prediction –It will always give the wrong prediction twice –Imagine executing a loop »Predictor will be wrong on the first and last iteration

7 EENG449b/Savvides Lec 10.7 2/17/04 A 2-bit Prediction Scheme 2- bit prediction scheme –Generalization for n-bit prediction A prediction must miss twice before it is changed

8 EENG449b/Savvides Lec 10.8 2/17/04 Branch Prediction Implementation Implications Branch predictors held in branch predictor buffers –Implemented as small caches accessed with instruction address at the IF phase of a pipeline –OR it could be implemented as a pair of bits attached to each block in the instruction cache This branch prediction scheme does not help in the basic 5-stage pipeline –The decision whether a branch is taken and the target address are computed at the same stage…

9 EENG449b/Savvides Lec 10.9 2/17/04 Branch Prediction Accuracy on SPEC 89 Benchmark Using 2-bit prediction, 4KB cache FP programs Integer programs

10 EENG449b/Savvides Lec 10.10 2/17/04 Performance of SPEC 98 Benchmark Remember –To evaluate performance you need to know the branch frequencies and misprediction penalties FP programs typically come from scientific applications and are more loop based Branches harder to predict in integer programs –Typically have higher branch frequency How can this be improved? –Perhaps increase the cache buffer –Increase the effectiveness of the predictor

11 EENG449b/Savvides Lec 10.11 2/17/04 Effects of Cache Buffer Size

12 EENG449b/Savvides Lec 10.12 2/17/04 Correlating Bit Predictors What about considering the behavior of other branches than the ones we are trying to predict? Goal: Use correlating or 2-level predictors to exploit the correlation between consecutive branches…

13 EENG449b/Savvides Lec 10.13 2/17/04 Branch Correlation Example if (aa==2) aa=0; if (bb==2) bb=0; if (aa!=bb){ DSUBUI R3, R1, #2 BNEZ R3, L1; branch b1 DADD R1, R0, R0 L1:DSUBUI R3,R2,#2 BNEZ R3, L2; branch b2 DADD R2,R0,R0 L2:DSUBU R3,R1,R2 BEQZ R3, L3; branch b3 Branch b3 is correlated with b1 and b2

14 EENG449b/Savvides Lec 10.14 2/17/04 Correlated Branch Example Consider the following code: if (d==0) d=1; if (d==1) BNEZ R1, L1 ; branch b1 DADDUI R1,R0,#1 L1: DADDUI R3,R1, #-1 BNEZ R3,L2 ; branch b2 … L2: What are the possible execution sequences when d=0,1,2?

15 EENG449b/Savvides Lec 10.15 2/17/04 Using a 1-bit Predictor Consider a sequence of b=2,0,2,0 and a 1-bit predictor P. b1 A. b1 NP. b1 P. b2 A. b2 NP. b2 d=2 NT T T NT T T d=0 T NT NT T NT NT d=2 NT T T NTT T d=0 T NT NT T NT NT BNEZ R1, L1 ; branch b1 DADDUI R1,R0,#1 L1: DADDUI R3,R1, #-1 BNEZ R3,L2 ; branch b2 … L2:

16 EENG449b/Savvides Lec 10.16 2/17/04 Using a 1-bit Predictor Consider a sequence of b=2,0,2,0 and a 1-bit predictor P. b1 A. b1 NP. b1 P. b2 A. b2 NP. b2 d=2 NT T T NT T T d=0 T NT NT T NT NT d=2 NT T T NTT T d=0 T NT NT T NT NT All branches are mispredicted !!! BNEZ R1, L1 ; branch b1 DADDUI R1,R0,#1 L1: DADDUI R3,R1, #-1 BNEZ R3,L2 ; branch b2 … L2:

17 EENG449b/Savvides Lec 10.17 2/17/04 Using a 1-bit Predictor with 1-bit Correlation X/X Prediction if last branch was NOT taken Prediction if last branch was taken NOTE: last branch refers to the preceding branch instruction not the previous execution of the current branch instruction

18 EENG449b/Savvides Lec 10.18 2/17/04 Using a 1-bit Predictor with 1-bit Correlation Consider a sequence of b=2,0,2,0 and a 1-bit predictor P. b1 A. b1 NP. b1 P. b2 A. b2 NP. b2 d=2 NT/NT T T/NT NT/NT T NT/T d=0 T/NT NT T/NT NT/T NT NT/T d=2 T/NT T T/NT NT/T T NT/T d=0 T/NT NT T/NT NT/T NT NT/T BNEZ R1, L1 ; branch b1 DADDUI R1,R0,#1 L1: DADDUI R3,R1, #-1 BNEZ R3,L2 ; branch b2 … L2:

19 EENG449b/Savvides Lec 10.19 2/17/04 Using a 1-bit Predictor with 1-bit Correlation Consider a sequence of b=2,0,2,0 and a 1-bit predictor P. b1 A. b1 NP. b1 P. b2 A. b2 NP. b2 d=2 NT/NT T T/NT NT/NT T NT/T d=0 T/NT NT T/NT NT/T NT NT/T d=2 T/NT T T/NT NT/T T NT/T d=0 T/NT NT T/NT NT/T NT NT/T Misprediction only on the first iteration of d=2! BNEZ R1, L1 ; branch b1 DADDUI R1,R0,#1 L1: DADDUI R3,R1, #-1 BNEZ R3,L2 ; branch b2 … L2:

20 EENG449b/Savvides Lec 10.20 2/17/04 (m,n) Predictors Use the behavior of last m branches to choose from 2 m branch predictors. Each is an n-bit predictor for a single branch Ex. A (2,2) branch predictor

21 EENG449b/Savvides Lec 10.21 2/17/04 Tournament Predictors N-bit predictors – use local information (m,n) predictors – use global information Tournament predictors –Local + global – enhanced performance Example of tournament predictors –Multilevel branch predictors »Uses several levels of branch prediction table »Has an algorithm to select from multiple predictors

22 EENG449b/Savvides Lec 10.22 2/17/04 Comparing Predictors

23 EENG449b/Savvides Lec 10.23 2/17/04 High Performance Instruction Delivery What else can be done besides branch prediction? Need to have high bandwidth instruction delivery –Modern multiple issue processors require 4-8 instructions per CPI

24 EENG449b/Savvides Lec 10.24 2/17/04 Branch-Target Buffers (BTB) How can we further reduce branch penalty? We need to know what is the instruction of the next instruction to fetch If the instruction is a branch and we know the PC then the penalty would be zero Branch-target-buffer – stores the predicted address for the next instruction after a branch Advantage for a 5-stage pipeline –Know the predicted instruction address 1 cycle earlier IF stage instead of ID stage

25 EENG449b/Savvides Lec 10.25 2/17/04 BTB has a cache structure Note that only predicted taken branches need to be stored Represent addresses of known branches

26 EENG449b/Savvides Lec 10.26 2/17/04 Branch Target Buffer Operation

27 EENG449b/Savvides Lec 10.27 2/17/04 Integrated Instruction Fetch Units Instead of using instruction fetch as one of the pipeline phases, use a more advanced instruction fetch unit –To support the demands of multiple issue processors Integrated IF has 3 main units –Integrated Branch Prediction –Instruction Prefetch »autonomously fetching ahead the given instructions –Instruction memory access and buffering »Tries to hide the overhead associated with fetching instructions from multiple cache lines by buffering instructions

28 EENG449b/Savvides Lec 10.28 2/17/04 Return Address Predictors Predict the return address of jumps that are not known at compile time –Returns from procedure calls. »Procedures get called at different points in the code Use a small stack of return addresses –Before a procedure is called put the return address on a stack and pop the stack on return –If the stack has enough depth – optimal prediction

29 EENG449b/Savvides Lec 10.29 2/17/04 Prediction Stack Performance Results based on a number of SPEC benchmarks

30 EENG449b/Savvides Lec 10.30 2/17/04 Recap So far we have seen Dynamic Scheduling – reduce data dependences –Tomasulo’s algorithms Dynamic Branch Prediction – Trying to reduce control dependences –N-bit predictors, (m,n) predictors, Tournament Predictors Achieve and ideal CPI of 1 –Branch target buffer, integrated IF, return address prediction

31 EENG449b/Savvides Lec 10.31 2/17/04 Multiple Issue Processors Try to issue multiple instructions per clock cycle Two basic flavors –Superscalar Processors »Issue variable number of instructions per clock cycle »Can be statically or dynamically scheduled –VLIW (Very Large Instruction Set) Processors »Issue a constant number of instructions formatted as a packet of smaller instructions »Parallelism across instructions is specifically indicated »Statically scheduled by the compiler

32 EENG449b/Savvides Lec 10.32 2/17/04 Next Time Midterm Next Tuesday –Multiple Issue Processors


Download ppt "EENG449b/Savvides Lec 10.1 2/17/04 February 17, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG."

Similar presentations


Ads by Google