Presentation is loading. Please wait.

Presentation is loading. Please wait.

EECE476: Computer Architecture Lecture 21: Faster Branches Branch Prediction with Branch-Target Buffers (not in textbook) The University of British ColumbiaEECE.

Similar presentations


Presentation on theme: "EECE476: Computer Architecture Lecture 21: Faster Branches Branch Prediction with Branch-Target Buffers (not in textbook) The University of British ColumbiaEECE."— Presentation transcript:

1 EECE476: Computer Architecture Lecture 21: Faster Branches Branch Prediction with Branch-Target Buffers (not in textbook) The University of British ColumbiaEECE 476© 2005 Guy Lemieux

2 2 Branch Prediction: Fetching Early branch decision in “D” stage –Still 1 cycle branch penalty Can we do better? Branch prediction, three steps: –1. Predict direction –2. Fetch from predicted direction –3. Nullify if mispredicted We’ve discussed steps 1 and 3, but not 2

3 3 Branch Prediction Problems How does branch prediction “fetch from predicted direction”? Pipeline cannot tell if instruction is a branch until “D” stage –At end of stage “D”, we know branch target and branch outcome –At this point, already too late to fill “I” stage New instruction already in “I” stage –How did we fill this “slot”? –Predict not-taken: fetch from PC+4 Works ok –Predict taken: fetch from PC+4+Imm16 BR instruction (Imm16) only known in “D” We can’t compute the branch target in “I” Must determine predicted-PC in “I” stage –AT THE SAME TIME as we are fetching the BR instruction itself –No time left to compute branch-target (PC + 4 + Imm16)

4 4 BPB + BTB Overview Prediction used to select new PC value Branch- Target Buffer Adder (+4) Branch- Prediction Buffer PC Instr Mem Instruction

5 5 Beyond Branch Prediction: Branch-Target Buffer (BTB) Problem: determine predicted-PC for branches in “I” stage Solution: –Branch-Target Buffer, similar to a cache –Stores precomputed “target address” for branch instructions Do not confuse this with a Branch-Prediction Buffer (BPB) Lots of bits (eg, 32 bits for target address) compared to BPB Approach –As normal: use PC to access InstrMem and fetch next instruction –In parallel: Use PC to access Branch-Target Buffer (BTB), gives us TARGET Use PC to access Branch-Prediction Buffer (BPB), gives us PREDICTION Compute PC+4 –Next: PREDICTION selects (PC+4) or (TARGET) for PC

6 6 BTB Problem: Non-branches What if instruction fetched isn’t a branch? –Must always choose PC+4 –Must ignore branch predictor output

7 7 BTB with Non-Branches Not all instructions are branches… Branch- Target Buffer Adder (+4) Branch- Prediction Buffer PC InstrMem Instruction 0 1 Branch?

8 8 BPB: Correctness on Misprediction Plain Vanilla: Branch-Prediction Buffer Only –Small size (say, 128 entries) –Uses only lower bits of PC for address –Multiple branches may map to same entry –Execution ALWAYS CORRECT even if prediction is wrong! Misprediction always corrected via nullify

9 9 BTB: Correctness on Misprediction BPB + Branch-Target Buffer –Necessarily: small size, uses only lower bits of PC for address –Multiple branches may map to same entry –If prediction wrong, the wrong PC is used in next cycle Must correct misprediction via nullify Must also correct the PC !!!! –Even if prediction correct, may still use wrong PC in next cycle Stored TARGET address may be wrong –Eg, stale data, or stored target is for different BR instruction Must ensure that correct PC is used (DETECT) If incorrect, must nullify instruction and restore PC (CORRECT)

10 10 Fixing BTB Correctness Problems Correctness Problem Summary 2 Cases: –Case 1: BR mispredicted Must nullify the mispredicted instruction Must correct PC –Case 2: BR predicted OK Must DETECT if correct PC was used (was target OK?) Case 2A: Correct TARGET –No problems, continue on… Case 2B: Incorrect TARGET –Must CORRECT PC to proper value and –Must nullify the fetched/predicted instruction

11 11 Fixing BTB: Case 1 Case 1: BR mispredicted –Misprediction known in “D” stage –Must nullify the mispredicted instruction in “I” Easy… we already know how to do this –Must correct PC Wrong PC in “I” stage, cannot use Take Correct PC from regular branch calculation in “D” stage

12 12 Correct new PC on misprediction (BEQ in D) Not shown: nullifying instruction in “I” stage Adder (+4) PC InstrMem Branch? I/DD/X RegFile Imm16 Adder == Mispredict Branch? 1 0 0 1 0 1 Branch- Prediction Buffer Branch- Target Buffer Fixing BTB: Case 1 Branch? Correct PC T/NT Mispredicted? Prediction? EQ?

13 13 Fixing BTB: Case 2 Case 2: BR predicted OK –Must DETECT if proper PC was used (was target OK?) –If wrong, must CORRECT PC to proper value and nullify the wrongly fetched/predicted instruction in I How to fix? –DETECT VALID TARGET Ensure only correct target is stored in BTB entry –Tag each entry with the entire PC of the branch instruction »REQUIRES MORE STORAGE BITS IN BTB !!! –Compare stored-tag to actual PC –IF TAG MATCH, permit branch prediction –IF TAG MISMATCH, do something safe »Safe ideas: insert bubble into “I” stage »OR fetch PC+4 and nullify-if-taken

14 14 Fixing BTB: Case 2 Case 2B: TAG MISMATCH, when is it detected? Option 2B-1: DETECT TAG MISMATCH early, in “I” stage –Allows immediate correction by safe actions Benefit: more likely to fetch a useful instruction after a branch –Adds compare logic after BTB Penalty: may increase cycle time Option 2B-2: DETECT TAG MISMATCH late, in “D” stage –Adds compare logic after BTB in “D” stage Benefit: keeps cycle time fast –Must CORRECT action already taken in “I” stage Must nullify instruction in “I” stage Restore correct PC by taking it from the branch calculation in “D” stage Easy  we already do this when we mispredict!

15 15 Fixing BTB: Cases 1 + 2 BTB adds TAG bits to store PC for each entry Adder (+4) PC InstrMem Branch? I/DD/X RegFile Imm16 Adder == Mispredict Branch? PC/TAG Mismatch? 1 0 == 0 1 0 1 Branch- Prediction Buffer Branch- Target Buffer TAG Bits Branch? Correct PC T/NT Mispredicted? Mismatch? Prediction? EQ?

16 16 Updating BTB After every branch –Update prediction bits in BPB –Update target in BTB BPB is a true dual-port memory –Every cycle, read Branch-Predict bits Read address: low bits of PC being used in “I” stage –Every cycle, (maybe) write new Branch-Predict bits Data: depends on previous Predict-state + BR outcome in “D” stage Write address: low bits of PC of BR instruction in “D” stage When to write: if BR executed in “D” BTB is a true dual-port memory –Every cycle, read TAG,Target bits Read address: low bits of PC being used in “I” stage –Every cycle, (maybe) write new TAG,Target bits Write address: low bits of PC of BR instruction in “D” stage When to write: if BR in “D” and BR is TAKEN (no need to store not-taken entries)

17 17 Adder (+4) PC InstrMem Branch? I/DD/X RegFile Imm16 Adder == Mispredict Branch? PC/TAG Mismatch? 1 0 == 0 1 0 1 Branch? Branch- Prediction Buffer Updating BTB: Required Data Branch- Target Buffer (Incl. Tag) TARGET to store in Tag (data) PC of Branch (addr) Correct PC Mispredicted? Mismatch? Prediction? EQ? Prediction? T / NT ? Branch?

18 18 Other Stuff Zero-cycle jumps/unconditional branches Zero-cycle conditional branches (on CC) Why is branch prediction is so important? –R4000 pipeline: 8 stages –Superscalar Execution –Dynamic Pipelining –Register Renaming –Speculative Execution Learn all of this soon! Come to next class! –Not covered in textbook!


Download ppt "EECE476: Computer Architecture Lecture 21: Faster Branches Branch Prediction with Branch-Target Buffers (not in textbook) The University of British ColumbiaEECE."

Similar presentations


Ads by Google