ECS 154B Computer Architecture II Spring 2009

Slides:



Advertisements
Similar presentations
Morgan Kaufmann Publishers The Processor
Advertisements

ECE 445 – Computer Organization
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Pipelined Processor.
Part 2 - Data Hazards and Forwarding 3/24/04++
Review: MIPS Pipeline Data and Control Paths
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 18 - Pipelined.
©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
 The actual result $1 - $3 is computed in clock cycle 3, before it’s needed in cycles 4 and 5  We forward that value to later instructions, to prevent.
Computer Organization Lecture Set – 06 Chapter 6 Huei-Yung Lin.
Lecture 28: Chapter 4 Today’s topic –Data Hazards –Forwarding 1.
Control Hazards.1 Review: Datapath with Data Hazard Control Read Address Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register.
1 Stalls and flushes  So far, we have discussed data hazards that can occur in pipelined CPUs if some instructions depend upon others that are still executing.
Chapter 4B: The Processor, Part B. Review: Why Pipeline? For Performance! I n s t r. O r d e r Time (clock cycles) Inst 0 Inst 1 Inst 2 Inst 4 Inst 3.
Pipeline Data Hazards: Detection and Circumvention Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly.
Pipelined Datapath and Control
CPE432 Chapter 4B.1Dr. W. Abu-Sufah, UJ Chapter 4B: The Processor, Part B-2 Read Section 4.7 Adapted from Slides by Prof. Mary Jane Irwin, Penn State University.
Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.
Basic Pipelining & MIPS Pipelining Chapter 6 [Computer Organization and Design, © 2007 Patterson (UCB) & Hennessy (Stanford), & Slides Adapted from: Mary.
CMPE 421 Parallel Computer Architecture Part 2: Hardware Solution: Forwarding.
1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,
2/15/02CSE Data Hazzards Data Hazards in the Pipelined Implementation.
CSE431 L07 Overcoming Data Hazards.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 07: Overcoming Data Hazards Mary Jane Irwin (
1/24/ :00 PM 1 of 86 Pipelining Chapter 6. 1/24/ :00 PM 2 of 86 Overview of Pipelining Pipelining is an implementation technique in which.
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 9: MIPS Pipeline.
CSIE30300 Computer Architecture Unit 05: Overcoming Data Hazards Hsin-Chou Chi [Adapted from material by and
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
CMPE 421 Parallel Computer Architecture Part 3: Hardware Solution: Control Hazard and Prediction.
Designing a Pipelined Processor
CPE432 Chapter 4B.1Dr. W. Abu-Sufah, UJ Chapter 4B: The Processor, Part B-1 Read Sections 4.7 Adapted from Slides by Prof. Mary Jane Irwin, Penn State.
LECTURE 9 Pipeline Hazards. PIPELINED DATAPATH AND CONTROL In the previous lecture, we finalized the pipelined datapath for instruction sequences which.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
CSE 340 Computer Architecture Spring 2016 Overcoming Data Hazards.
Lecture 9 Pipeline Hazards.
Stalling delays the entire pipeline
Note how everything goes left to right, except …
CDA 3101 Spring 2016 Introduction to Computer Organization
Test 2 review Lectures 5-10.
Appendix C Pipeline implementation
Chapter 4 The Processor Part 4
ECS 154B Computer Architecture II Spring 2009
ECE232: Hardware Organization and Design
Forwarding Now, we’ll introduce some problems that data hazards can cause for our pipelined processor, and show how to handle them with forwarding.
Chapter 4 The Processor Part 3
Review: MIPS Pipeline Data and Control Paths
Csci 136 Computer Architecture II – Data Hazard, Forwarding, Stall
Chapter 6 Enhancing Performance with Pipelining
Morgan Kaufmann Publishers The Processor
EI 209 Computer Organization Fall 2017
Pipelining review.
Morgan Kaufmann Publishers Enhancing Performance with Pipelining
Computer Organization CS224
Pipelining in more detail
Data Hazards Data Hazard
The Processor Lecture 3.6: Control Hazards
Control unit extension for data hazards
The Processor Lecture 3.5: Data Hazards
CSC3050 – Computer Architecture
Pipelining (II).
Control unit extension for data hazards
Morgan Kaufmann Publishers The Processor
Introduction to Computer Organization and Architecture
Control unit extension for data hazards
Systems Architecture II
Pipelining - 1.
Stalls and flushes Last time, we discussed data hazards that can occur in pipelined CPUs if some instructions depend upon others that are still executing.
©2003 Craig Zilles (derived from slides by Howard Huang)
ELEC / Computer Architecture and Design Spring 2015 Pipeline Control and Performance (Chapter 6) Vishwani D. Agrawal James J. Danaher.
Presentation transcript:

ECS 154B Computer Architecture II Spring 2009 Pipelining Datapath and Control §6.2-6.3 Partially adapted from slides by Mary Jane Irwin, Penn State And Kurtis Kredo, UCD

Data Forwarding Take the result from the earliest point that it exists in any of the pipeline state registers and forward it to the functional units (e.g., the ALU) that need it that cycle For ALU functional unit: the inputs can come from any pipeline register rather than just from ID/EX by adding multiplexors to the inputs of the ALU connecting the Rd write data in EX/MEM or MEM/WB to either (or both) of the EX’s stage Rs and Rt ALU mux inputs adding the proper control hardware to control the new muxes Other functional units may need similar forwarding logic With forwarding, the CPU can achieve a CPI of 1 even in the presence of data dependencies

Data Forwarding Conditions Only forward when state changes Use RegWrite control signal Don’t forward if destination is $0 Forward if previous destination current source Forwarding unnecessary in other cases Forward if either source register needs it

EX/MEM Forwarding Register value needed by next instruction Calculated by ALU this clock cycle Needed as input to ALU on next clock cycle ALU IM Reg DM add $4, $5, $6 add $8, $4, $7 or add $8, $7, $4 ALU IM Reg DM

EX/MEM Forwarding R[Rs] R[Rt] Immediate Rd Rt ID/EX EX/MEM MEM/WB RegWrite R[Rs] ALU MemWrite MemtoReg R[Rt] ALU Cntrl Immediate Rd Rt RegDst

EX/MEM Forwarding R[Rs] R[Rt] Immediate Rd Rt Rs ID/EX EX/MEM MEM/WB RegWrite ALU MemWrite MemtoReg R[Rt] ALU Cntrl Immediate Rd Rt Rs RegDst Forward Unit

MEM/WB Forwarding Register value needed two instructions later Calculated by ALU this clock cycle Needed as input to ALU in two clock cycles ALU IM Reg DM add $4, $5, $6 ALU IM Reg DM Unrelated Instruction add $8, $4, $7 or add $8, $7, $4 ALU IM Reg DM

MEM/WB Forwarding R[Rs] R[Rt] Immediate Rd Rt Rs ID/EX EX/MEM MEM/WB RegWrite ALU MemWrite MemtoReg R[Rt] ALU Cntrl Immediate Rd Rt Rs RegDst Forward Unit

MEM/WB Forwarding R[Rs] R[Rt] Immediate Rd Rt Rs ID/EX EX/MEM MEM/WB RegWrite ALU MemWrite MemtoReg R[Rt] ALU Cntrl Immediate Rd Rt Rs RegDst Forward Unit

Forwarding Complication Forward unit must forward most recent value It may appear necessary to do MEM/WB and EX/MEM forwarding simultaneously Only do EX/MEM forwarding this cycle Do EX/MEM forwarding again next cycle ALU IM Reg DM add $4, $5, $6 ALU IM Reg DM add $4, $4, $13 ALU IM Reg DM add $8, $4, $7

Complete ALU Input Forwarding ID/EX EX/MEM MEM/WB R[Rs] RegWrite ALU MemWrite MemtoReg R[Rt] ALU Cntrl Immediate Rd Rt Rs RegDst Forward Unit

Register Definition How can we specify a particular signal? Each state register has a copy May vary across stages Reference the register that contains the value RegWrite value in EX/MEM state register EX/MEM.RegWrite RegWrite value in MEM/WB state register MEM/WB.RegWrite

Forwarding Conditions We want to forward when Previous instruction updates state Previous destination used as current source Previous destination not $0 Data Hazard code add $4, $5, $6 sub $8, $4, $9 How do we do this in hardware?

Forwarding Unit R[Rs] R[Rt] Immediate Rd Rt Rs ID/EX EX/MEM MEM/WB RegWrite ALU MemWrite MemtoReg R[Rt] ALU Cntrl Immediate Rd Rt Rs RegDst Forward Unit

Forwarding Unit Details EX/MEM.RegWrite Forward EX/MEM.RegisterRd[4] ID/EX.RegisterRs[4] … EX/MEM.RegisterRd = ID/EX.RegisterRs EX/MEM.RegisterRd[0] ID/EX.RegisterRs[0] EX/MEM.RegisterRd[4] … EX/MEM.RegisterRd[0] EX/MEM.RegisterRd ≠ 0

Other Forwarding Possible Forwarding to Data Memory Data memory to data memory copy ALU IM Reg DM add $4, $5, $6 ALU IM Reg DM sw $4, 40($7) ALU IM Reg DM lw $4, 16($7) ALU IM Reg DM sw $4, 40($7)

Forwarding to Memory What happens here? add $5, $6, $7 sw $5, 8($10) Forwarding must occur, but not through ALU

Forwarding to Memory R[Rs] R[Rt] Immediate Rd Rt Rs ID/EX EX/MEM MEM/WB MemWrite R[Rs] RegWrite ALU MemtoReg R[Rt] ALU Cntrl Immediate Rd Rt Rs Forward Unit

Forwarding to Memory R[Rs] R[Rt] Immediate Rd Rt Rs ID/EX EX/MEM MEM/WB MemWrite R[Rs] RegWrite ALU MemtoReg R[Rt] ALU Cntrl Immediate Rd Rt Rs Forward Unit

Load Use Hazards Require Stalls No forwarding can help Requires a Hazard Detection Unit Detects hazards Inserts pipeline bubble ALU IM Reg DM lw $4, 16($5) nop ALU IM Reg DM add $8, $4, $7

Stalling The Pipeline Stalls occur by inserting pipeline bubble Hold some state registers (stage repeats) Allow other stages to continue processing ALU IM Reg DM lw $4, 16($5) ALU IM Reg DM add becomes nop Repeats ALU IM Reg DM add $8, $4, $7

Stalling The Pipeline Load Use Hazard code lw $4, 16($5) add $8, $4, $7 ALU IM Reg DM add lw

Stalling The Pipeline Load Use Hazard code lw $4, 16($5) add $8, $4, $7 ALU IM Reg DM Hazard Detected add lw

Stalling The Pipeline Load Use Hazard code lw $4, 16($5) add $8, $4, $7 Stage Repeated IM Reg DM Reg Bubble Inserted add nop lw

Stalling The Pipeline Load Use Hazard code lw $4, 16($5) add $8, $4, $7 Data Forwarded IM Reg ALU Reg add nop lw

Stalling The Pipeline Load Use Hazard code lw $4, 16($5) add $8, $4, $7 No Register Written IM Reg ALU DM add nop

How To Stall The Pipeline Two ways to stall the pipeline Set control signals to safe values Change instruction Set control signals All control signals become 0 Only MemWrite and RegWrite need to be 0 Change the instruction Make destination register $0 MIPS nop = 0x00000000 (sll $0, $0, 0)

How To Stall The Pipeline Control Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Data 2 Read Address IM PC IF/ID ID/EX

How To Stall The Pipeline Hazard Detection Control IF/IDWrite MemRead PCWrite Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Data 2 Read Address IM PC IF/ID Rt ID/EX

Hazard Detection Details Stall pipeline when all of the following occur ID/EX.MemRead ID/EX.RegisterRt ≠ 0 ID/EX.RegisterRt = IF/ID.RegisterRs or ID/EX.RegisterRt = IF/ID.RegisterRt

Control Hazard Review Caused when next instruction is unknown Conditional branch result unknown Unconditional branch before destination calculated ALU IM Reg DM ALU IM Reg DM ALU IM Reg DM ALU IM Reg DM

Control Hazards No simple way to handle control hazards Stalls often required Incorrect guesses lead to wasted cycles Forwarding handles most data hazards Decreases CPI to nearly 1 for data hazards Only specific situations require a stall Thankfully, control hazards occur less frequently

Stalling Control Hazards Stalling always possible, but affects CPI ALU IM Reg DM ALU IM Reg DM ALU IM Reg DM ALU IM Reg DM

CPU May Assume Branch Not Taken Always assume branch is not taken No delay if branch not taken ALU IM Reg DM beq $4, $5, label ALU IM Reg DM add $7, $8, $9 label:

Branch Delay A taken branch must flush instructions beq $4, $5, label add $7, $8, $9 or $10, $11, $12 … label: sub $7, $8, $9 ALU IM Reg DM Incorrect Branch or add beq

Branch Delay A taken branch must flush instructions beq $4, $5, label add $7, $8, $9 or $10, $11, $12 … label: sub $7, $8, $9 IM DM Reg sub nop nop beq

Adding nop To Decode Stage Clears Execute Stage Hazard Detection Control Read Address IM Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Data 2 PC IF/ID ID/EX Clears Decode Stage

Reducing Branch Delay Reduce delay by computing branch in Decode Comparison hardware required Stage longer: read Register File, then compare Branch destination adder moved to Decode Only removes one stall Forwarding logic required add $4, $5, $6 or $9, $10, $3 beq $4, $7, label ALU IM Reg DM ALU IM Reg DM ALU IM Reg DM

Branch Delay in Deep Pipelines Deeper pipelines incur larger delay Branch Decision Stage Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Four Stalls Required Stage 1 Stage 2 Stage 3 Stage 4 Stage 5

Dynamic Branch Prediction For deep pipelines or superscalar CPUs static prediction too costly Decreasing transistor size means more room for other functionality Have CPU guess branch result based on history of previous branches Branch Prediction Buffer: Table of instruction addresses and branch results Branch Target Buffer: Table of instruction addresses and branch destinations When instruction executed again, look at buffers to predict result and destination

Dynamic Branch Prediction Branch Prediction Buffer 0x4C T, N, T, N, N, N 0x84 T, T, T, N, T, N 0xF8 T, N, N, N, N, T Branch Target Buffer 0x4C 0x5023345C 0x84 0x501FF524 0xF8 0x500CD300 Only lower portion of address used Branch history depth depends on implementation Only lower portion of address used Branch destination

Dynamic Branch Prediction When prediction correct no delay occurs Pipeline remains full CPI remains near 1 When prediction incorrect Flush instructions currently in pipeline Fetch correct instruction CPI increases above 1

1-bit Branch Prediction CPU uses last branch result as prediction for: add $4, $5, $9 … addi $8, $8, -1 bne $8, $0, for Loop executes 10 times First time prediction incorrect (predicted not taken) 2-9 predictions correct (taken) Last time prediction incorrect (predicted taken) Short history causes incorrect predictions Branch almost always taken Predicted incorrect at least twice

2-bit Branch Prediction Require two consecutive wrong answers to change prediction Taken Not Taken Predict Taken Predict Taken Taken Taken Not Taken Not Taken Predict Not Taken Predict Not Taken Taken Not Taken

2-bit Branch Prediction Previous example for: add $4, $5, $9 … addi $8, $8, -1 bne $8, $0, for 2-bit Prediction better First prediction correct 2-9 predictions correct Last prediction incorrect Improves to one incorrect per loop Taken Not Taken Predict Taken Predict Taken Taken Taken Not Taken Not Taken Predict Not Taken Predict Not Taken Taken Not Taken

Other Branch Prediction Schemes Many other ways to predict branch results Correlating predictor Maintain local history (per branch instruction) Maintain global history (for all branches) Use combination of histories to make prediction Tournament predictor Maintain multiple predictors per branch instruction Use selector to choose most accurate predictor