Presentation is loading. Please wait.

Presentation is loading. Please wait.

OOOE & Exception © Avi Mendelson 05/2005 1 MAMAS – Computer Architecture Out Of Order Execution cont. Lecture #8-9 Dr. Avi Mendelson Alex Gontmakher Some.

Similar presentations


Presentation on theme: "OOOE & Exception © Avi Mendelson 05/2005 1 MAMAS – Computer Architecture Out Of Order Execution cont. Lecture #8-9 Dr. Avi Mendelson Alex Gontmakher Some."— Presentation transcript:

1 OOOE & Exception © Avi Mendelson 05/2005 1 MAMAS – Computer Architecture Out Of Order Execution cont. Lecture #8-9 Dr. Avi Mendelson Alex Gontmakher Some of the slides were taken from: Lihu Rapoport

2 OOOE & Exception © Avi Mendelson 05/2005 2 Agenda  Problems in Control Flow –Speculative execution in OOOE machines –Interrupts –Exceptions  Advanced Branch Prediction Topics

3 OOOE & Exception © Avi Mendelson 05/2005 3 Static Option 1: Stall  Stall pipe when branch is encountered until resolved  Stall impact: assumptions – CPI = 1 – 20% of instructions are branches – Stall 3 cycles on every taken branch  CPI new = 1 + 0.2 × 3 = 1.6 (CPI new = CPI Ideal + avg. stall cycles / instr.) We lose 60% of the performance

4 OOOE & Exception © Avi Mendelson 05/2005 4 Static Option 2: Delayed Branch  Define branch to take place AFTER n following instruction –HW executes n instructions following the branch regardless of branch is taken or not  SW puts in the n slots following the branch instructions that need to be executed regardless of branch resolution –Instructions that are before the branch instruction, or –Instructions from the converged path after the branch  If cannot find independent instructions, put NOP Original Code r3 = 23 R4 = R3+R5 If (r1==r2) goto x R1 = R4 + R5 X: R7 = R1 New Code If (r1==r2) goto x r3 = 23 R4 = R3 +R5 NOP R1 = R4 + R5 X: R7 = R1 

5 OOOE & Exception © Avi Mendelson 05/2005 5 Delayed Branch Performance  Filling 1 delay slot is easy, 2 is hard, 3 is harder  Assuming we can effectively fill d% of the delayed slots CPI new = 1 + 0.2 × (3 × (1-d))  For example, for d=0.5, we get CPI new = 1.3  Mixing architecture with micro-arch –New generations requires more delay slots –Cause compatibility issues between generations

6 OOOE & Exception © Avi Mendelson 05/2005 6 Static Option 3: Predict Not Taken  Execute instructions from the fall-through (not-taken) path –As if there is no branch –If the branch is not-taken (~50%), no penalty is paid  If branch actually taken –Flush the fall-through path instructions before they change the machine state (memory / registers) –Fetch the instructions from the correct (taken) path  Assuming ~50% branches not taken on average CPI new = 1 + (0.2 × 0.5) × 3 = 1.3

7 OOOE & Exception © Avi Mendelson 05/2005 7 Dynamic Branch Prediction Look up PC of inst in fetch ?= Branch predicted taken or not taken No:Inst is not pred to be branch Yes:Inst is pred to be branch Branch PC Target PC History Predicted Target  Add a Branch Target Buffer (BTB) the predicts (at fetch) –Instruction is a branch –Branch taken / not-taken –Taken branch target

8 OOOE & Exception © Avi Mendelson 05/2005 8 BTB  Allocation –Allocate instructions identified as branches (after decode)  Both conditional and unconditional branches are allocated –Not taken branches need not be allocated  BTB miss implicitly predicts not-taken  Prediction –BTB lookup is done parallel to IC lookup –BTB provides  Indication that the instruction is a branch (BTB hits)  Branch predicted target  Branch predicted direction  Branch predicted type (e.g., conditional, unconditional)  Update (when branch outcome is known) –Branch target –Branch history (taken / not-taken)

9 OOOE & Exception © Avi Mendelson 05/2005 9 BTB (cont.)  Wrong prediction –Predict not-taken, actual taken –Predict taken, actual not-taken, or actual taken but wrong target  In case of wrong prediction – flush the pipeline –Reset latches (same as making all instructions to be NOPs) –Select the PC source to be from the correct path  Need get the fall-through with the branch –Start fetching instruction from correct path  Assuming P% correct prediction rate CPI new = 1 + (0.2 × (1-P)) × 3 –For example, if P=0.7 CPI new = 1 + (0.2 × 0.3) × 3 = 1.18

10 OOOE & Exception © Avi Mendelson 05/2005 10 Adding BTB to In-Order Pipeline: Easy! ALUSrc 6 ALU result Zero + Shift left 2 ALU Control ALUOp RegDst RegWrite Read reg 1 Read reg 2 Write reg Write data Read data 1 Read data 2 Register File [15-0] [20-16] [15-11] Sign extend 16 32 ID/EX EX/MEM MEM /WB Instruction MemRead MemWrite Address Write Data Read Data Memory Branch PCSrc MemtoReg 4 + IF/ID PC 0 1 muxmux 0 1 muxmux 0 muxmux 1 0 muxmux Inst. Memory Address Instruction BTB 1 2 pred target pred dir PC+4 (Not-taken target) taken target 3 Mispredict Detection Unit Flush predicted target PC+4 (Not-taken target) predicted direction − 4 address target direction alloc/updt

11 OOOE & Exception © Avi Mendelson 05/2005 11 Using The BTB PC moves to next instruction Inst Mem gets PC and fetches new inst BTB gets PC and looks it up IF/ID latch loaded with new inst BTB Hit ?Br taken ? PC  PC + 4PC  pred addr IF ID IF/ID latch loaded with pred inst IF/ID latch loaded with seq. inst Branch ? yesno yes noyes EXE

12 OOOE & Exception © Avi Mendelson 05/2005 12 Using The BTB (cont.) ID EXE MEM WB Branch ? Calculate br cond & trgt Flush pipe & update PC Corect pred ? yesno IF/ID latch loaded with correct inst continue Update BTB yes no continue

13 OOOE & Exception © Avi Mendelson 05/2005 13 Executing Beyond Branches  If Branch prediction does not apply:  Limited to the parallelism within a basic-block – A basic-block is ~5 instruction long. (1) r1  r4 / r7 (2)r2  r2 + r1 (3)r3  r2 - 5 (4)beq r3,0,300 If the beq is predicted NT, (5)r8  r8 + 1 Inst 5 can be spec executed  Using branch prediction, allows us to execute beyond branches –But what if we execute an instruction beyond a branch and then it turns out that we predicted the wrong path ? Solution: Speculative Execution. Problem: how to recover if the branch prediction was found to be incorrect

14 OOOE & Exception © Avi Mendelson 05/2005 14 Misspeculation Recovery  Two alternatives: –Wait for the branch to arrive to retirement –Do partial update

15 OOOE & Exception © Avi Mendelson 05/2005 15 Waiting for the branch to arrive to retirement  When an instruction arrives to retirement, we are sure that the architectural state is consistent, i.e., the viewable state is equivalent to the in-order view of execution  Therefore, if the branch will be discovered to be mispredicted, we can purge the system and start executing from the correct place.  The Good: easy to implement  The Bad: when the pipe is long, there is considerable performance loss

16 OOOE & Exception © Avi Mendelson 05/2005 16 Instruction Q MOB RS ROB Execute Retire RAT R0 R1 R2 R3 LD R1,X R2 <- R3 R1 <- R1+R0  BR L R2 <- R1 BR takes 2 cycles to be resolved

17 OOOE & Exception © Avi Mendelson 05/2005 17 Instruction Q MOB RS ROB Execute Retire RAT R0 R1 R2 R3 LD R1,X R2 <- R3 R1 <- R1+R0  LD R1,X RB0 M0 LD RB0,X BR L R2 <- R1

18 OOOE & Exception © Avi Mendelson 05/2005 18 Instruction Q MOB RS ROB Execute Retire RAT R0 R1 R2 R3 LD R1,X R2 <- R3 R1 <- R1+R0  LD R1,X RB0 M1 LD RB0,X R2 <- R3 RB1 RB1 <- R3 RS0 BR L R2 <- R1

19 OOOE & Exception © Avi Mendelson 05/2005 19 Instruction Q MOB RS ROB Execute Retire RAT R0 R1 R2 R3 LD R1,X R2 <- R3 R1 <- R1+R0  LD R1,X RB0 M1 LD RB0,X R2 <- R3 RB1 RB1 <- R3 RS0 BR L RB1 <- R3 RS1 BR L R2 <- R1 BR L

20 OOOE & Exception © Avi Mendelson 05/2005 20 Instruction Q MOB RS ROB Execute Retire RAT R0 R1 R2 R3 LD R1,X R2 <- R3 R1 <- R1+R0  LD R1,X RB2 M1 LD RB0,X R2 <- R3 RB1 OK R1 <- R1+R0 RS2 RB3 <- RB0+R0 BR L R2 <- R1 BR L

21 OOOE & Exception © Avi Mendelson 05/2005 21 Instruction Q MOB RS ROB Execute Retire RAT R0 R1 R2 R3 LD R1,X R2 <- R3 R1 <- R1+R0  LD R1,X R2 <- R3 RB4 OK R1 <- RB0+R0 RS2 RB0 <- RB1+R0 BR L R2 <- R1 BR L OK R2 <- R1 RB4 <- RB3 BR L TAKEN RS3 RB2

22 OOOE & Exception © Avi Mendelson 05/2005 22 Partial Update – first try  Immediately when the branch is found to be mispredicted –Purge the instructions following the branch –Stop fetching new instructions –Wait for the instructions to finish executing –Reset the RAT –Resume bringing new instructions  What’s the difference? –From the moment misprediction is discovered, we execute only the instructions on the correct path  fewer unnecessary instructions executed –Recovery is faster than waiting for retirement  How much do we save?  Problem: It is possible for the recovery process to be interrupted –If another branch preceding this one is found mispredicted in the meantime.

23 OOOE & Exception © Avi Mendelson 05/2005 23 Partial Update – full solution  When a pipeline is long, waiting for the instruction to arrive to recovery is prohibitively expensive. Waiting for the preceding instructions to complete is not much better.  We want to purge the pipeline as soon as possible!  Problem: RAT can hold intermediate values, state of the pipeline is inconsistent. –Several approaches to this problem known, we present here one of them.

24 OOOE & Exception © Avi Mendelson 05/2005 24 Instruction Q MOB RS ROB Execute Retire RAT R0 R1 R2 R3 LD R1,X R2 <- R3 R1 <- R1+R0  BR L R2 <- R1 BR takes 5 cycles to be resolved

25 OOOE & Exception © Avi Mendelson 05/2005 25 Instruction Q MOB RS ROB Execute Retire RAT R0 R1 R2 R3 LD R1,X R2 <- R3 R1 <- R1+R0  LD R1,X RB0 M0M0 (R1) LD RB0,X BR L R2 <- R1

26 OOOE & Exception © Avi Mendelson 05/2005 26 Instruction Q MOB RS ROB Execute Retire RAT R0 R1 R2 R3 LD R1,X R2 <- R3 R1 <- R1+R0  LD R1,X RB0 M1 (R1) LD RB0,X R2 <- R3 RB1 (R2) RB1 <- R3 RS0 BR L R2 <- R1

27 OOOE & Exception © Avi Mendelson 05/2005 27 Instruction Q MOB RS ROB Execute Retire RAT R0 R1 R2 R3 LD R1,X R2 <- R3 R1 <- R1+R0  LD R1,X RB0 M1 LD RB0,X R2 <- R3 RB1 RB1 <- R3 RS0 BR L RB1 <- R3 RS1 BR L R2 <- R1 (R2) (R1) BR L

28 OOOE & Exception © Avi Mendelson 05/2005 28 Instruction Q MOB RS ROB Execute Retire RAT R0 R1 R2 R3 LD R1,X R2 <- R3 R1 <- R1+R0  LD R1,X RB3 M1 LD RB0,X R2 <- R3 RB1 OK R1 <- R1+R0 RS2 RB3 <- RB0+R0 W (RB0) BR L R2 <- R1 (R1) BR L

29 OOOE & Exception © Avi Mendelson 05/2005 29 Instruction Q MOB RS ROB Execute Retire RAT R0 R1 R2 R3 LD R1,X R2 <- R3 R1 <- R1+R0  LD R1,X RB2 R2 <- R3 RB4 OK R1 <- RB0+R0 RS2 RB0 <- RB1+R0 BR L R2 <- R1 BR L M1M1 R2 <- R1 RB4 <- RB3 BR L TAKEN RS3 W (RB0) (RB1) RB1 RB2 LD RB0,X RB0 The instruction indicate that RS3 writes to R2, RS2 indicates that the previous value or R2 in RAT was RB1 so R2 will get the value RB1

30 OOOE & Exception © Avi Mendelson 05/2005 30 Interrupts and Exceptions  Interrupts: external events that require the processor to switch to other code –Timer, Device notification, etc. –Asynchronous  Exceptions: internal events caused by the performed instructions –Traps, Error conditions –Synchronous

31 OOOE & Exception © Avi Mendelson 05/2005 31 Interrupt Driven Data Transfer CPU IOC Memory add sub and or nop read store... rti memory user program (1) I/O interrupt (2) save PC (3) interrupt service addr interrupt service routine (4) מספר התקנים קשורים לאותו קו פסיקה צריך לזהות מי גרם לפסיקה ניתן ע " י polling ניתן לשרשר את קו ה -ack, באופן שהקרוב ביותר יענה Device 1Device 2Device n למעבד בד"כ קו פסיקה אחד בקר הפסיקות רצוי שידע מי גרם לפסיקה בזמן פסיקה חייב במעבד לדעת: מי גרם, עדיפות שלו, ID, כתובת שגרת פסיקה

32 OOOE & Exception © Avi Mendelson 05/2005 32 Interrupts general Complications for pipelines: interrupts occur in the middle of an instruction must restart interrupting & subsequent instructions Precise interrupts preserve the model that instructions execute in program-generated order. How to handle precise interrupt identify the instruction that caused the interrupt instructions before faulting instruction finish disable writes for faulting & subsequent instructions force trap instruction into pipeline trap routine save the state of the executing program correct the cause of the interrupt restore program state restart faulting & subsequent instructions

33 OOOE & Exception © Avi Mendelson 05/2005 33 Handling Exceptions and Interrupts on OOO  When an exception is discovered: –Set an exception bit and indicate the type of exception –Wait for the instruction to reach retirement (Why?) –Clean the pipeline and start the handler  Interrupts are asynchronous, so they don’t need to be precise –Inject to pipeline a special instruction that will jump to the handling microcode –Same as with branch misprediction: either wait for the instruction to reach retirement and purge the pipeline, or stop fetching instructions in advance

34 OOOE & Exception © Avi Mendelson 05/2005 34 Advanced Branch Prediction

35 OOOE & Exception © Avi Mendelson 05/2005 35 Introduction  Need to predict: –Conditional branch direction (taken or no taken)  Actual direction is known only after execution  Wrong direction prediction causes a full flush –All taken branch (conditional taken or unconditional) targets  Target of direct branches known at decode  Target of indirect branches known at execution –Branch type  Conditional, uncond. direct, uncond. indirect, call, return  Target: minimize branch misprediction rate for a given predictor size

36 OOOE & Exception © Avi Mendelson 05/2005 36 Branches and Performance  MPI : misprediction-per-instruction: # of incorrectly predicted branches MPI = total # of instructions  MPI correlates well with performance. For example: –MPI = 1% (1 out of 100 instructions  1 out of 20 branches) –IPC=2 (IPC is the average number of instructions per cycle), –flush penalty of 10 cycles  We get:  MPI = 1%  flush in every 100 instructions  flush in every 50 cycles (since IPC=2),  10 cycles flush penalty every 50 cycles  20% in performance

37 OOOE & Exception © Avi Mendelson 05/2005 37 Target Array  TA is accessed using the branch address (branch IP)  Implemented as an n-way set associative cache  Tags usually partial –Save space –Can get false hits –Few branches aliased to the same entry –No correctness only performance  TA predicts the following –Indication that instruction is a branch –Predicted target –Branch type  Unconditional: take target  Conditional: predict direction  TA allocated / updated at execution Branch IP tag target predicted target hit / miss (indicates a branch) type predicted type

38 OOOE & Exception © Avi Mendelson 05/2005 38 Conditional Branch Direction Prediction

39 OOOE & Exception © Avi Mendelson 05/2005 39 One-Bit Predictor  Problem: 1-bit predictor has a double mistake in loops Branch Outcome 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 Prediction? 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 branch IP Prediction (at fetch): previous branch outcome counter array / cache Update (at execution) Update bit with branch outcome

40 OOOE & Exception © Avi Mendelson 05/2005 40 Bimodal (2-bit) Predictor  A 2-bit counter avoids the double mistake in glitches –Need “more evidence” to change prediction  2 bits encode one of 4 states –00 – strong NT, 01 – weakly NT, 10 – weakly taken, 11 – strong taken –Initial state: weakly-taken (most branches are taken)  Update –Branch was actually taken: increment counter (saturate at 11) –Branch was actually not-taken: decrement counter (saturate at 00)  Predict according to m.s.bit of counter (0 – NT, 1 – taken)  Predicts well monotonic branches: one mistake per loop iteration  Does not predict well branches with patterns like 0101…01 00 SNT taken not-taken taken not-taken 01 WNT 10 WT 11 ST

41 OOOE & Exception © Avi Mendelson 05/2005 41 l.s. bits of branch IP Prediction = msb of counter 2-bit-sat counter array Update counter with branch outcome Bimodal Predictor (cont.)

42 OOOE & Exception © Avi Mendelson 05/2005 42 Bimodal Predictor - example  Br1 prediction –Pattern:1 01 0 1 0 –counter:2 3 2 32 3 –Prediction:TTTTTT  Br2 prediction –Pattern:01 0 1 0 1 –counter:2 1 2 12 1 –Prediction:T nTT nTT nT  Br3 prediction –Pattern:11 1 1 1 0 –counter:2 3 3 33 3 –Prediction:T TT TT T Code:  Loop: ….  br1: if (n/2) {  ……. }  br2: if ((n+1)/2) {  ……. }  n--  br3: JNZ n, Loop

43 OOOE & Exception © Avi Mendelson 05/2005 43 2-Level Prediction: Local Predictor  Save the history of each branch in a Branch History Register (BHR): –A shift-register updated by branch outcome –Saves the last n outcomes of the branch –Used as a pointer to an array of bits specifying direction per history  Example: assume n=6 –Assume the pattern 000100010001... –At the steady-state, the following patterns are repeated in the BHR: 000100010001... 000100 001000 010001 100010  Following 000100, 010001, 100010 the jump is not taken  Following 001000 the jump is taken BHR 0 2 n -1 n

44 OOOE & Exception © Avi Mendelson 05/2005 44 Local Predictor (cont.)  There could be glitches from the pattern –Use 2-bit saturating counters instead of 1 bit to record outcome:  Too long BHRs are not good: –Past history may be no longer relevant –Warm-Up is longer –Counter array becomes too big Update History with branch outcome prediction = msb of counter 2-bit-sat counter array Update counter with branch outcome history BHR

45 OOOE & Exception © Avi Mendelson 05/2005 45 Local Predictor: private counter arrays Branch IP taghistory prediction = msb of counter 2-bit-sat counter arrays Update counter with branch outcome Update History with branch outcome history cache Predictor size: #BHRs × (tag_size + history_size + 2 × 2 history_size ) Example: #BHRs = 1024; tag_size=8; history_size=6  size=1024 × (8 + 6 + 2×2 6 ) = 142Kbit Holding BHRs and counter arrays for many branches:

46 OOOE & Exception © Avi Mendelson 05/2005 46 Local Predictor: shared counter arrays  Using a single counter array shared by all BHR’s –All BHR’s index the same array –Branches with similar patterns interfere with each other prediction = msb of counter Branch IP 2-bit-sat counter array taghistory history cache Predictor size: #BHRs × (tag_size + history_size) + 2 × 2 history_size Example: #BHRs = 1024; tag_size=8; history_size=6  size=1024 × (8 + 6) + 2×2 6 = 14.1Kbit

47 OOOE & Exception © Avi Mendelson 05/2005 47 Local Predictor: lselect  lselect reduces inter-branch-interference in the counter array prediction = msb of counter Branch IP 2-bit-sat counter array taghistory history cache h h+m m l.s.bits of IP Predictor size: #BHRs × (tag_size + history_size) + 2 × 2 history_size + m

48 OOOE & Exception © Avi Mendelson 05/2005 48 Local Predictor: lshare  lshare reduces inter-branch-interference in the counter array by mapping common patterns in different branches to different counters h h h l.s.bits of IP history cache taghistory prediction = msb of counter Branch IP 2-bit-sat counter array Predictor size: #BHRs × (tag_size + history_size) + 2 × 2 history_size + m

49 OOOE & Exception © Avi Mendelson 05/2005 49  The behavior of some branches is highly correlated with the behavior of other branches: if (x < 1)... if (x > 1)...  Using a Global History Register (GHR), the prediction of the second if may be based on the direction of the first if  For other branches the history interference might be destructive Global Predictor

50 OOOE & Exception © Avi Mendelson 05/2005 50 Global Predictor (cont.) Update History with branch outcome prediction = msb of counter 2-bit-sat counter array Update counter with branch outcome history GHR The predictor size: history_size + 2*2 history_size Example: history_size = 12  size = 8 K Bits

51 OOOE & Exception © Avi Mendelson 05/2005 51 gshare combines the global history information with the branch IP Global Predictor: Gshare prediction = msb of counter 2-bit-sat counter array Update counter with branch outcome Branch IP history GHR Update History with branch outcome

52 OOOE & Exception © Avi Mendelson 05/2005 52 Chooser  The chooser may also be indexed by the GHR +1 if Bimodal / Local correct and Global wrong -1 if Bimodal / Local wrong and Global correct Bimodal / Local Global Branch IP Prediction Chooser array (an array of 2-bit sat. counters) GHR  A chooser selects between 2 predictor that predict the same branch –Use the predictor that was more correct in the past

53 OOOE & Exception © Avi Mendelson 05/2005 53 Speculative History Updates  Deep pipeline  many cycles between fetch and branch resolution –If history is updated only at resolution  Local: future occurrences of the same branch may see stale history  Global: future occurrences of all branches may see stale history –History is speculatively updated according to the prediction  History must be corrected if the branch is mispredicted  Speculative updates are done in a special field to enable recovery  Speculative History Update –Speculative history updated assuming previous predictions are correct –Speculation bit set to indicate that speculative history is used –Counter array is not updated speculatively: prediction can change (state change from 01 to 10 or 10 to 01) only on a misprediction  On branch resolution –Update the real history and reset speculative histories if mispredicted

54 OOOE & Exception © Avi Mendelson 05/2005 54 Return Stack Buffer  A return instruction is a special case of an indirect branch: –Each times it jumps to a different target –The target is determined by the location of the corresponding call instruction  The idea: –Hold a small stack of targets –When the target array predicts a call  Push the address of the instruction which follows the call into the stack –When the target array predicts a return  Pop a target from the stack and use it as the return address

55 OOOE & Exception © Avi Mendelson 05/2005 55 Pentium® M  Combines 3 predictors –Bimodal, Global and Loop predictor  Loop predictor analyzes branches to see if they have loop behavior –Moving in one direction (taken or NT) a fixed number of times –Ended with a single movement in the opposite direction

56 OOOE & Exception © Avi Mendelson 05/2005 56 Pentium® M – Indirect Branch Predictor  The target of indirect branches is data dependent –Part of indirect branches still have a single target at run time –Some have many targets  E.g., case statement in a Java byte-code interpreter  Indirect branches heavily used in object-oriented code (C++, Java)  became a growing source of branch mispredictions  Indirect branch is resolved at execution  high misprediction penalty  A dedicated indirect branch target predictor (iTA) –Chooses targets based on a global history (similar to global predictor)  Initially indirect branch is allocated only in the target array (TA)  If target is mispredicted, allocate an entry in the iTA corresponding to the global history leading to this instance of the indirect branch –Monotonic indirect branches are still predicted by the TA –Data-dependent indirect branches allocate as many targets as needed

57 OOOE & Exception © Avi Mendelson 05/2005 57 Indirect branch target prediction (cont)  Prediction from the iTA is used if –TA indicates an indirect branch –iTA hits for the current global history (XORed with branch address) Target Array Indirect Target Predictor Branch IP Predicted Target M X U hit indirect branch hit Predicted Target HIT GHR History

58 OOOE & Exception © Avi Mendelson 05/2005 58 Backup

59 OOOE & Exception © Avi Mendelson 05/2005 59 Branch Prediction in commercial Processors

60 OOOE & Exception © Avi Mendelson 05/2005 60  386 / 486 –All branches are statically predicted Not Taken  Pentium –IP based, 2-bit saturating counters (Lee-Smith) –BTB miss - statically predicted Not Taken Older Processors

61 OOOE & Exception © Avi Mendelson 05/2005 61 Intel Pentium III  2-level, local histories, per-set counters  4-way set associative: 512 entries in 128 sets IP Tag Hist 1001 Pred= msb of counter 9 0 15 Way 0Way 1 Target Way 2 Way 3 9 4 32 counters 128 sets PTV 211 32 LRR 2 Per-Set Branch Type 00- cond 01- ret 10- call 11- uncond Return Stack Buffer

62 OOOE & Exception © Avi Mendelson 05/2005 62 Alpha 264 - LG Chooser 1024 3 Counters 4 ways 256 Histories IP In each entry: 6 bit tag + 10 bit History 4096 2 Counters GHR 12 4096 Counters M X U Global Local Chooser 2  New entry on the Local stage is allocated on a global stage miss- prediction  Chooser state-machines: 2 bit each: –one bit saves last time global correct/wrong, –and the other bit saves for the local correct/wrong  Chooses Local only if local was correct and global was wrong


Download ppt "OOOE & Exception © Avi Mendelson 05/2005 1 MAMAS – Computer Architecture Out Of Order Execution cont. Lecture #8-9 Dr. Avi Mendelson Alex Gontmakher Some."

Similar presentations


Ads by Google