Morgan Kaufmann Publishers The Processor

Slides:



Advertisements
Similar presentations
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Advertisements

CML CML CS 230: Computer Organization and Assembly Language Aviral Shrivastava Department of Computer Science and Engineering School of Computing and Informatics.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
Pipelining - Hazards.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipeline Hazards See: P&H Chapter 4.7.
ECE 445 – Computer Organization
Part 2 - Data Hazards and Forwarding 3/24/04++
Review: MIPS Pipeline Data and Control Paths
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
 The actual result $1 - $3 is computed in clock cycle 3, before it’s needed in cycles 4 and 5  We forward that value to later instructions, to prevent.
Morgan Kaufmann Publishers Dr. Zhao Zhang Iowa State University
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
Lecture 28: Chapter 4 Today’s topic –Data Hazards –Forwarding 1.
1 Stalls and flushes  So far, we have discussed data hazards that can occur in pipelined CPUs if some instructions depend upon others that are still executing.
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
Pipeline Data Hazards: Detection and Circumvention Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly.
Pipelined Datapath and Control
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 8: MIPS Pipelined.
Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2012 Revised from original slides provided by MKP.
CMPE 421 Parallel Computer Architecture
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]
Winter 2002CSE Topic Branch Hazards in the Pipelined Processor.
2/15/02CSE Data Hazzards Data Hazards in the Pipelined Implementation.
CSE431 L07 Overcoming Data Hazards.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 07: Overcoming Data Hazards Mary Jane Irwin (
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 9: MIPS Pipeline.
CMPE 421 Parallel Computer Architecture Part 3: Hardware Solution: Control Hazard and Prediction.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
CSE 340 Computer Architecture Spring 2016 Overcoming Data Hazards.
Computer Organization CS224
Stalling delays the entire pipeline
Note how everything goes left to right, except …
Morgan Kaufmann Publishers
Morgan Kaufmann Publishers The Processor
Single Clock Datapath With Control
Pipeline Implementation (4.6)
Chapter 4 The Processor Part 4
ECS 154B Computer Architecture II Spring 2009
Forwarding Now, we’ll introduce some problems that data hazards can cause for our pipelined processor, and show how to handle them with forwarding.
Chapter 4 The Processor Part 3
Review: MIPS Pipeline Data and Control Paths
Morgan Kaufmann Publishers The Processor
Morgan Kaufmann Publishers The Processor
Pipelining review.
The processor: Pipelining and Branching
Morgan Kaufmann Publishers Enhancing Performance with Pipelining
Computer Organization CS224
Pipelining in more detail
Pipeline control unit (highly abstracted)
The Processor Lecture 3.6: Control Hazards
Control unit extension for data hazards
The Processor Lecture 3.5: Data Hazards
Instruction Execution Cycle
Pipeline control unit (highly abstracted)
CSC3050 – Computer Architecture
Pipeline Control unit (highly abstracted)
Pipelining (II).
Control unit extension for data hazards
Morgan Kaufmann Publishers The Processor
Control unit extension for data hazards
©2003 Craig Zilles (derived from slides by Howard Huang)
Presentation transcript:

Morgan Kaufmann Publishers The Processor April 11, 2017 CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Chapter 4 The Processor Zhao Zhang Iowa State University Revised from original slides provided by MKP Chapter 1 — Computer Abstractions and Technology

Week 10 Overview Expected project progress: Complete Mini-Project B, part 1 ALU data hazard and forwarding MEM data hazard, forwarding, and pipeline stall Control hazard and branch execution Chapter 1 — Computer Abstractions and Technology — 2

Data Hazards from ALU Instructions Morgan Kaufmann Publishers 11 April, 2017 Data Hazards from ALU Instructions An instruction depends on completion of data access by a previous instruction add $s0, $t0, $t1 sub $t2, $s0, $t3 Consider this sequence: sub $2, $1,$3 and $12,$2,$5 or $13,$6,$2 add $14,$2,$2 sw $15,100($2) Chapter 4 — The Processor — 3 Chapter 4 — The Processor

Data Hazards from ALU Instructions Morgan Kaufmann Publishers 11 April, 2017 Data Hazards from ALU Instructions A naïve approach is to insert nops to wait out the dependence add $s0, $t0, $t1 sub $t2, $s0, $t3 Change to add $s0, $t0, $t1 noop noop sub $t2, $s0, $t3 Chapter 4 — The Processor — 4 Chapter 4 — The Processor

Data Hazards in ALU Instructions Morgan Kaufmann Publishers 11 April, 2017 Data Hazards in ALU Instructions Another naïve approach is to stall the 2nd instruction in the dependence add $s0, $t0, $t1 sub $t2, $s0, $t3 Chapter 4 — The Processor — 5 Chapter 4 — The Processor

Data Hazards in ALU Instructions Observations on this scenario The first, ALU instruction produces a register value The following instruction(s) consumes the register value sub $2, $1,$3 and $12,$2,$5 or $13,$6,$2 add $14,$2,$2 sw $15,100($2) Chapter 1 — Computer Abstractions and Technology — 6

Data Hazards in ALU Instructions What is exactly the problem? A register value is written to the register file in the WB stage, two cycles after the EX stage The following instructions read the register value in the beginning of the ID stage IF ID EX MEM WB and sub … … … sub reads $1, $3 or and sub … … AND reads old $2 add or and sub … OR reads old $2 sub writes to $2 add reads new $2 sw add or and sub Chapter 1 — Computer Abstractions and Technology — 7

Data Hazards in ALU Instructions Chapter 1 — Computer Abstractions and Technology — 8

Forwarding (aka Bypassing) Morgan Kaufmann Publishers 11 April, 2017 Forwarding (aka Bypassing) Use result when it is computed The result is already in the pipeline Don’t wait for it to be stored in a register Requires extra connections in the datapath Chapter 4 — The Processor — 9 Chapter 4 — The Processor

Dependencies & Forwarding Morgan Kaufmann Publishers 11 April, 2017 Dependencies & Forwarding Chapter 4 — The Processor — 10 Chapter 4 — The Processor

Data Forwarding To what place: The two ALU inputs in the EX stage datapath Forwarded register value may replace the values from ID From where: The destination register value in pipeline registers Source 1: EX/MEM register Source 2: MEM/WB register Chapter 1 — Computer Abstractions and Technology — 11

Data Forwarding When to forward: Data dependence detected between Instructions at the EX and MEM stage Instructions at the EX and WB stage How to detect: Compare source and destination register numbers Chapter 1 — Computer Abstractions and Technology — 12

Data Forwarding Example sub $2, $1,$3 # MEM=>EX forwarding and $12,$2,$5 # WB =>EX forwarding or $13,$6,$2 add $14,$2,$2 sw $15,100($2) IF ID EX MEM WB or and sub … … AND gets forwarded new $2 value add or and sub … sw add or and sub SUB gets forwarded new $2 value Chapter 1 — Computer Abstractions and Technology — 13

Data Forwarding Logic Design sub $2, $1,$3 # and $12,$2,$5 # comp $2 with $2, $5 or $13,$6,$2 # comp $2 with $6, $2 Detection: Compare rs and rt at EX, with rd at MEM and rd at WB Those register numbers are in the IE/EX, EX/MEM, and MEM/WB registers rs was not in IE/EX register, we can add it Chapter 1 — Computer Abstractions and Technology — 14

Data Forwarding Logic Design Morgan Kaufmann Publishers 11 April, 2017 Data Forwarding Logic Design Register numbers in pipeline Source registers of the instruction at the EX stage ID/EX.RegisterRs, ID/EX.RegisterRt Destination register of the instruction at the MEM stage EX/MEM.RegisterRd Destination register of the instruction at WB stage MEM/WB.RegisterRd Potential data hazards when 1a. EX/MEM.RegisterRd = ID/EX.RegisterRs 1b. EX/MEM.RegisterRd = ID/EX.RegisterRt 2a. MEM/WB.RegisterRd = ID/EX.RegisterRs 2b. MEM/WB.RegisterRd = ID/EX.RegisterRt Fwd from EX/MEM pipeline reg Fwd from MEM/WB pipeline reg Chapter 4 — The Processor — 15 Chapter 4 — The Processor

Data Forwarding Logic Design Morgan Kaufmann Publishers 11 April, 2017 Data Forwarding Logic Design But only if forwarding instruction will write to a register! EX/MEM.RegWrite=1, MEM/WB.RegWrite=1 It’s possible an instruction has a matching rd but doesn’t write to register And only if Rd for that instruction is not $zero EX/MEM.RegisterRd ≠ 0, MEM/WB.RegisterRd ≠ 0 It’s allowed for an instruction to write to $0 Chapter 4 — The Processor — 16 Chapter 4 — The Processor

Morgan Kaufmann Publishers 11 April, 2017 Forwarding Paths The forwarding unit accesses three pipeline registers Note rs is added to IE/EX pipeline register Chapter 4 — The Processor — 17 Chapter 4 — The Processor

Forwarding Conditions Morgan Kaufmann Publishers 11 April, 2017 Forwarding Conditions EX hazard: Data forwarding from EX/MEM register if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10 if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10 MEM hazard: Data forwarding from MEM/WB register if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01 This is not the final version (see slides 23) Chapter 4 — The Processor — 18 Chapter 4 — The Processor

Datapath with Forwarding Morgan Kaufmann Publishers 11 April, 2017 Datapath with Forwarding Chapter 4 — The Processor — 19 Chapter 4 — The Processor

Caveats Data forwarding happens in the beginning of the cycle The forwarding unit is in the EX stage, with its inputs from three pipeline stages A small overhead added to the critical path latency of the EX stage For EX hazard, data forwarding is from MEM to EX Precisely, the register value of the instruction being executed at the MEM stage is forwarded to the instruction being executed at the EX stage Chapter 1 — Computer Abstractions and Technology — 20

Caveats For MEM hazard, the forwarding is from the WB to EX From the instruction at WB to the instruction at EX Data forwarding is to EX not to ID An instruction may read obsolete register values at ID, with the values latched at ID/EX register The correct values may be at EX (EX Hazard) or MEM (MEM Hazard) Any obsolete values get replaced at EX There is no WB hazard Register write at WB and register read at ID, for the same register, may complete within one cycle Chapter 1 — Computer Abstractions and Technology — 21

Morgan Kaufmann Publishers 11 April, 2017 Double Data Hazard Consider the sequence: add $1,$1,$2 add $1,$1,$3 add $1,$1,$4 Both hazards occur Want to use the most recent Revise MEM hazard condition Only fwd if EX hazard condition isn’t true Chapter 4 — The Processor — 22 Chapter 4 — The Processor

Revised Forwarding Condition Morgan Kaufmann Publishers 11 April, 2017 Revised Forwarding Condition MEM hazard (revision from slide 18) if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01 Chapter 4 — The Processor — 23 Chapter 4 — The Processor

Load-Use Data Hazard Load-use data hazard: A load instruction is followed immediately by an instruction using the value of load How is a load instruction different from an ALU instruction? ALU inst: destination register value available at the end of the EX stage Load inst: destination register value available at the end of the MEM stage Note the next instruction may need the value in the beginning of its EX stage Chapter 1 — Computer Abstractions and Technology — 24

Morgan Kaufmann Publishers 11 April, 2017 Load-Use Data Hazard Can’t always avoid stalls by forwarding If value not computed when needed Can’t forward backward in time! Chapter 4 — The Processor — 25 Chapter 4 — The Processor

Morgan Kaufmann Publishers 11 April, 2017 Load-Use Data Hazard Need to stall for one cycle Chapter 4 — The Processor — 26 Chapter 4 — The Processor

Load-Use Hazard How to insert a pipeline bubble (lost cycle)? lw $2, 20($1) sub $4, $2, $5 or $8, $2, $6 When the load instruction is at the EX stage Hold the instruction at the IF stage Do not update the PC Hold the instruction at the ID stage Do not change the IF/ID register Insert a nop at the EX stage Make all control signals in ID/EX register to zero Particularly, RegWrite = 0 and MemWrite = 0 Move forward MEM and WB Chapter 1 — Computer Abstractions and Technology — 27

Load-Use Hazard Detection Morgan Kaufmann Publishers 11 April, 2017 Load-Use Hazard Detection To detect, check if A load instruction is at the EX stage ID/EX.MemRead = 1 The instruction at the ID stage reads the register value of load ID/EX.RegisterRt = IF/ID.RegisterRs, or ID/EX.RegisterRt = IF/ID.RegisterRt (for R-type) If detected, stall IF and ID, insert bubble at EX, move forward MEM and MB Chapter 4 — The Processor — 28 Chapter 4 — The Processor

Morgan Kaufmann Publishers 11 April, 2017 Pipeline Stall The nop has all control signals set to zero It does nothing at EX, MEM and WB Prevent update of PC and IF/ID register Using instruction is decoded again (OK) Following instruction is fetched again (OK) 1-cycle stall allows MEM to read data for lw Can subsequently forward from WB to EX Need to add new control lines PCWrite for holding or updating PC IF/IDWrite for holding or update IF/ID register Chapter 4 — The Processor — 29 Chapter 4 — The Processor

Stall/Bubble in the Pipeline Morgan Kaufmann Publishers 11 April, 2017 Stall/Bubble in the Pipeline Stall inserted here Chapter 4 — The Processor — 30 Chapter 4 — The Processor

Stall/Bubble in the Pipeline Morgan Kaufmann Publishers 11 April, 2017 Stall/Bubble in the Pipeline Or, more accurately… Chapter 4 — The Processor — 31 Chapter 4 — The Processor

Datapath with Hazard Detection Morgan Kaufmann Publishers 11 April, 2017 Datapath with Hazard Detection Chapter 4 — The Processor — 32 Chapter 4 — The Processor

Stalls and Performance Morgan Kaufmann Publishers 11 April, 2017 Stalls and Performance The BIG Picture Stalls reduce performance But are required to get correct results Compiler can arrange code to avoid hazards and stalls Requires knowledge of the pipeline structure Chapter 4 — The Processor — 33 Chapter 4 — The Processor

Code Scheduling to Avoid Stalls Morgan Kaufmann Publishers 11 April, 2017 Code Scheduling to Avoid Stalls Reorder code to avoid use of load result in the next instruction C code for A = B + E; C = B + F; lw $t1, 0($t0) lw $t2, 4($t0) add $t3, $t1, $t2 sw $t3, 12($t0) lw $t4, 8($t0) add $t5, $t1, $t4 sw $t5, 16($t0) lw $t1, 0($t0) lw $t2, 4($t0) lw $t4, 8($t0) add $t3, $t1, $t2 sw $t3, 12($t0) add $t5, $t1, $t4 sw $t5, 16($t0) stall stall 13 cycles 11 cycles Chapter 4 — The Processor — 34 Chapter 4 — The Processor

Morgan Kaufmann Publishers 11 April, 2017 Control Hazards Branch determines flow of control Two branch outcomes: Taken or Not-Taken Fetching next instruction depends on branch outcome Pipeline can’t always fetch correct instruction Still working on ID stage of branch In MIPS pipeline Need to compare registers and compute target early in the pipeline Add hardware to do it in ID stage Chapter 4 — The Processor — 35 Chapter 4 — The Processor

Control Hazards Several caveats The CPU doesn’t recognize a branch until it reaches the end of the ID stage Every cycle, the CPU has to fetch one instruction Cannot afford to wait and see Must predict the next PC every cycle The CPU may predict “always not-taken” (MIPS 5-stage pipeline) Alternatively, the CPU may predict branch outcome dynamically (advanced CPU design) Chapter 1 — Computer Abstractions and Technology — 36

Control Hazards This MIPS pipeline always predicts Not-Taken Easy prediction: The next PC is current PC plus 4 No need to design complex branch prediction unit More Taken than Not-Taken in most programs What happens if the branch is wrong? Will have mis-fetched instructions Flush those instructions before they take effect i.e. Before they write to memory or register A Taken branch incurs a performance penalty Chapter 1 — Computer Abstractions and Technology — 37

Morgan Kaufmann Publishers 11 April, 2017 Performance Impact §4.8 Control Hazards If branch outcome determined in MEM Flush these instructions (Set control values to 0) PC Three cycles wasted on a taken branch Chapter 4 — The Processor — 38 Chapter 4 — The Processor

Performance Impact The performance loss is 3 cycles per taken branch If branch outcome determined in MEM Move execution of branch to the ID stage! Only beq and bne are supported in original MIPS Testing equality and inequality is very fast, do it at the end of ID Branch target can be calculate in ID Branch target = PC + extended offset PC and offset are known in the beginning of ID At the of ID, the CPU knows the branch outcome and branch target Chapter 1 — Computer Abstractions and Technology — 39

Morgan Kaufmann Publishers 11 April, 2017 Reducing Branch Delay Move hardware to determine outcome to ID stage Target address adder Register comparator Example code with branch taken 36: sub $10, $4, $8 40: beq $1, $3, 7 44: and $12, $2, $5 48: or $13, $2, $6 52: add $14, $4, $2 56: slt $15, $6, $7 ... 72: lw $4, 50($7) Chapter 4 — The Processor — 40 Chapter 4 — The Processor

Morgan Kaufmann Publishers 11 April, 2017 Example: Branch Taken Chapter 4 — The Processor — 41 Chapter 4 — The Processor

Early Branch Outcome Pipeline changes for early branch outcome 2nd PC adder and the shifter moved to ID Comparator added to ID No zero any more from ALU CPU flushes one instruction for every taken branch CPU detects taken branch at ID The instruction at the IF will be flushed 1 lost cycles instead of 3 lost cycles per taken branch Chapter 1 — Computer Abstractions and Technology — 42

Pipeline Flushing When CPU detects a taken branch at ID Update PC with branch target (already have) Flush the instruction IF stage Add flush signal to IF/ID pipeline When flushing, convert the instruction in IF/ID register to 32-bit zeros 0x00000000 is “and $0, $0, $0”, effectively a nop Chapter 1 — Computer Abstractions and Technology — 43

Morgan Kaufmann Publishers 11 April, 2017 Example: Branch Taken Note: Branch does nothing in EX, MEM and WB Chapter 4 — The Processor — 44 Chapter 4 — The Processor

Pipeline Bubble on Branch Morgan Kaufmann Publishers 11 April, 2017 Pipeline Bubble on Branch Taken branch incurs a pipeline bubble because of instruction flushing Chapter 4 — The Processor — 45 Chapter 4 — The Processor

Data Hazards for Branches Moving branch execution to ID is not so easy May need another forwarding unit The forwarding unit has to be in the ID stage The current forwarding unit, in the EX stage, obviously doesn't work Need extensions to the hazard detection unit, and more pipeline stalls Branch uses register values at ID, ALU and load produce register values at EX and MEM Chapter 1 — Computer Abstractions and Technology — 46

Data Hazards for Branches Morgan Kaufmann Publishers 11 April, 2017 Data Hazards for Branches If a comparison register is a destination of 2nd or 3rd preceding ALU instruction IF ID EX MEM WB add $1, $2, $3 IF ID EX MEM WB add $4, $5, $6 IF ID EX MEM WB … IF ID EX MEM WB beq $1, $4, target Can resolve using forwarding From MEM to ID, and from WB to ID Chapter 4 — The Processor — 47 Chapter 4 — The Processor

Data Hazards for Branches Morgan Kaufmann Publishers 11 April, 2017 Data Hazards for Branches If a comparison register is a destination of preceding ALU instruction or 2nd preceding load instruction May need 1 stall cycle However, beq needs the value at the end of ID IF ID EX MEM WB lw $1, addr IF ID EX MEM WB add $4, $5, $6 beq stalled IF ID beq $1, $4, target ID EX MEM WB Chapter 4 — The Processor — 48 Chapter 4 — The Processor

Data Hazards for Branches Morgan Kaufmann Publishers 11 April, 2017 Data Hazards for Branches If a comparison register is a destination of immediately preceding load instruction May need 2 stall cycles Again, beq needs the value at the end of ID, so it’s possible to reduce stall to one cycle IF ID EX MEM WB lw $1, addr beq stalled IF ID beq stalled ID beq $1, $0, target ID EX MEM WB Chapter 4 — The Processor — 49 Chapter 4 — The Processor

Mini-Project C In Mini-Project C, implement The simple MIPS pipeline Data forwarding and hazard detection Not-taken branch prediction with pipeline flushing Chapter 1 — Computer Abstractions and Technology — 50

Delayed Branch Delayed branch may remove the one-cycle stall The instruction right after the beq is executed no matter the branch is taken or not (sub instruction in the example) Alternatingly saying, the execution of beq is delayed by one cycle sub $10, $4, $8 beq $1, $3, 7 beq $1, $3, 7 => sub $10, $4, $8 and $12, $2, $5 and $12, $2, $5 Must find an independent instruction, otherwise May have to fill in a nop instruction, or Need two variants of beq, delayed and not delayed Chapter 1 — Computer Abstractions and Technology — 51

Morgan Kaufmann Publishers 11 April, 2017 Branch Prediction We’ve actually studied one form of branch prediction: always not-taken Longer pipelines can’t readily determine branch outcome early Stall penalty becomes unacceptable Predict outcome of branch Only stall if prediction is wrong In MIPS pipeline Can predict branches not taken Fetch instruction after branch, with no delay Chapter 4 — The Processor — 52 Chapter 4 — The Processor