CIS429/529 Winter 2007 Pipelining-1 1 Pipeling RISC/MIPS64 five stage pipeline Basic pipeline performance Pipeline hazards Branch hazards More pipeline.

Slides:



Advertisements
Similar presentations
1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
Advertisements

CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
COMP 4211 Seminar Presentation Based On: Computer Architecture A Quantitative Approach by Hennessey and Patterson Presenter : Feri Danes.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
CIS429/529 Winter 2007 Pipelining II- 1 Additional pipelining topics Why pipelining is so hard: exception handling ILP techniques: loop unrolling.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
MIPS64 Instruction Format
CIS429.S00: Lec10- 1 Control Hazards Created by branch statements BEQZLOC ADDR1,R2,R3. LOCSUBR1,R2,R3 PC needs to be computed but it happens too late in.
Chapter 5 Pipelining and Hazards
Chapter Six Enhancing Performance with Pipelining
CIS629 Fall 2002 Pipelining 2- 1 Control Hazards Created by branch statements BEQZLOC ADDR1,R2,R3. LOCSUBR1,R2,R3 PC needs to be computed but it happens.
Computer Architecture Pipelines Diagrams are from Computer Architecture: A Quantitative Approach, 2nd, Hennessy and Patterson.
EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
DLX Instruction Format
The Basics: Pipelining J. Nelson Amaral University of Alberta 1.
Appendix A Pipelining: Basic and Intermediate Concepts
EECC551 - Shaaban #1 Lec # 4 winter Data Hazards Requiring Stall Cycles In some code sequence cases, potential data hazards cannot be handled.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
EECC551 - Shaaban #1 Lec # 2 Winter DLX Instruction Format 16 bits6 bits5 bits Immediate rdrs1Opcode 6 bits5 bits 11 bits Opcoders1rs2rdfunc.
Pipelining Basics Assembly line concept An instruction is executed in multiple steps Multiple instructions overlap in execution A step in a pipeline is.
-1.1- PIPELINING 2 nd week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM PIPELINING 2 nd week References Pipelining concepts The DLX.
Pipeline Hazard CT101 – Computing Systems. Content Introduction to pipeline hazard Structural Hazard Data Hazard Control Hazard.
CSC 4250 Computer Architectures September 15, 2006 Appendix A. Pipelining.
Lecture 7: Pipelining Review Kai Bu
COMP381 by M. Hamdi 1 Pipelining Improving Processor Performance with Pipelining.
Pipelining. 10/19/ Outline 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and Interrupts Conclusion.
Memory/Storage Architecture Lab Computer Architecture Pipelining Basics.
Lecture 5: Pipelining Implementation Kai Bu
Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.
1 Appendix A Pipeline implementation Pipeline hazards, detection and forwarding Multiple-cycle operations MIPS R4000 CDA5155 Spring, 2007, Peir / University.
EEL5708 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Pipelining.
Electrical and Computer Engineering University of Cyprus LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.
1 Pipelining Part I CS What is Pipelining? Like an Automobile Assembly Line for Instructions –Each step does a little job of processing the instruction.
Processor Design CT101 – Computing Systems. Content GPR processor – non pipeline implementation Pipeline GPR processor – pipeline implementation Performance.
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
Appendix A. Pipelining: Basic and Intermediate Concept
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
EE524/CptS561 Jose G. Delgado-Frias 1 Processor Basic steps to process an instruction IFID/OFEXMEMWB Instruction Fetch Instruction Decode / Operand Fetch.
11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
CSC 4250 Computer Architectures September 22, 2006 Appendix A. Pipelining.
Data Hazards Dependent instructions add %g1, %g2, %g3 sub %l1, %g3, %o0 Forwarding helps, but not all hazards can be avoided.
Instruction-Level Parallelism
Exceptions Another form of control hazard Could be caused by…
Computer Organization CS224
Lecture 15: Pipelining: Branching & Complications
CMSC 611: Advanced Computer Architecture
Single Clock Datapath With Control
Appendix C Pipeline implementation
\course\cpeg323-08F\Topic6b-323
Pipelining: Implementation
Exceptions & Multi-cycle Operations
School of Computing and Informatics Arizona State University
Chapter 3: Pipelining 순천향대학교 컴퓨터학부 이 상 정 Adapted from
CMSC 611: Advanced Computer Architecture
Pipelining review.
The processor: Pipelining and Branching
Pipelining in more detail
\course\cpeg323-05F\Topic6b-323
The Processor Lecture 3.6: Control Hazards
An Introduction to pipelining
Instruction Execution Cycle
Project Instruction Scheduler Assembler for DLX
Overview What are pipeline hazards? Types of hazards
Pipeline Control unit (highly abstracted)
Pipelining Appendix A and Chapter 3.
MIPS Pipelining: Part I
Lecture 06: Pipelining Implementation
Pipelining Hazards.
Presentation transcript:

CIS429/529 Winter 2007 Pipelining-1 1 Pipeling RISC/MIPS64 five stage pipeline Basic pipeline performance Pipeline hazards Branch hazards More pipeline performance Challenges of pipelining

CIS429/529 Winter 2007 Pipelining-1 2 RISC Datapath 1. Instruction Fetch Cycle (IF) 2. Instruction decode/register fetch cycle (ID) 3. Execution/effective address cyclic (EX) 4. Memory Access (MEM) 5. Write-back (WB)

CIS429/529 Winter 2007 Pipelining-1 3 Classic RISC Five Stage Pipeline (Fig. A.1)

CIS429/529 Winter 2007 Pipelining-1 4 Classic RISC Five Stage Pipeline (Fig A.2)

CIS429/529 Winter 2007 Pipelining-1 5 Pipelining yields performance gains Pipelining increases instruction throughput, but does not reduce the execution time of any single instruction. Idealized maximum speedup = number of pipeline stages Idealized maximum speedup achieved when one instruction completes each clock cycle. CPI(unpipelined) = P CPI(pipelined) = 1 Speedup = (IC * P * clock) / (IC * 1 * clock) = P.

CIS429/529 Winter 2007 Pipelining-1 6 Limits on pipeline performance Imbalance among pipeline stages (limited by the slowest stage) Overhead due to pipeline register delays Overhead due to clock skew (max delay between when clock signal arrives at any two registers) Overhead to fill/drain the pipeline and PIPELINE HAZARDS

CIS429/529 Winter 2007 Pipelining-1 7 Pipeline Hazards Structural hazards - conflicts for HW resources Data hazards - conflicts for access to data Control hazards - uncertainty about the next instruction to enter the pipeline

CIS429/529 Winter 2007 Pipelining-1 8 Pipelining Performance w/ hazards Hazards are dealt with by stalling the pipeline. Stalls delay one or more stages for the instruct- ion in the pipeline, causing the CPI to increase. CPI(unpipelined) = P CPI(pipelined) = 1 + stall cycles per instruct. Speedup = P / (1 + stall cycles per instruct.).

CIS429/529 Winter 2007 Pipelining-1 9 Structural Hazards: insufficient HW resources

CIS429/529 Winter 2007 Pipelining-1 10 Dealing with structural hazards Add more hardware Stall the pipeline

CIS429/529 Winter 2007 Pipelining-1 11 Data Hazards: Incorrect access to data because pipelining changes the order of read/write accesses to data DADDR1,R2,R3 DSUBR4,R1,R5 ANDR6,R1,R7 ORR8,R1,R9 XORR10,R1,R11

CIS429/529 Winter 2007 Pipelining-1 12 Dealing with data hazards Forwarding - HW technique that gets needed data from the pipeline registers (rather than waiting for it to be WB in the regular registers) and moving it to where that data is needed in a timely manner. Stalling (with pipeline interlock)

CIS429/529 Winter 2007 Pipelining-1 13 Data hazard example #1 (Fig. A.6)

CIS429/529 Winter 2007 Pipelining-1 14 Overcoming data hazard example #1 with forwarding (Fig. A.7)

CIS429/529 Winter 2007 Pipelining-1 15 Data hazard example 2: forwarding doesn’t work (Fig. A.9)

CIS429/529 Winter 2007 Pipelining-1 16 Overcoming data hazard example #2 by stalling

CIS429/529 Winter 2007 Pipelining-1 17 Control Hazards Created by branch statements BEQZLOC ADDR1,R2,R3. LOCSUBR1,R2,R3 PC needs to be computed but it happens too late (at the end of ID, the 2nd stage)

CIS429/529 Winter 2007 Pipelining-1 18 Four Branch Hazard Alternatives #1 STALL: until branch direction is known #2 Predict Branch Not Taken: guess that the branch will not be taken and execute successor instruction. Cancel out instructions in the pipeline if wrong guess. #3 Predict Branch Taken: opposite of above. #4 Delayed Branch: insert useful instructions into the pipeline until the branch direction is known.

CIS429/529 Winter 2007 Pipelining-1 19 Solution #1: Stall PC is computed in ID stage Thus only a 1 cycle stall is incurred for branch hazards Solution #1 (STALL) incurs a penalty of 1 cycle.

CIS429/529 Winter 2007 Pipelining-1 20 #2 Predict Not Taken (Fig. A.12)

CIS429/529 Winter 2007 Pipelining-1 21 #2 Predict Not Taken Penalty: 0 if not taken, 1 if taken Assume 30% conditional branches of which 60% are not taken: CPI = (.70) ( 1) + (.30) ((.60)*(1) + (.40)(1+1))

CIS429/529 Winter 2007 Pipelining-1 22 #3 Predict Taken Similar situation and analysis as Predict Not Taken

CIS429/529 Winter 2007 Pipelining-1 23 #4 Delayed Branch Delay Slot = the slots in the pipeline that would be stalls (we are going to fill them with instructions and try to avoid stalls) (a) Non-cancelling Delayed Branch Useful instructions are inserted into the delay slots (b) Cancelling Delayed Branch Some instructions are inserted into the delay slots and cancelled if wrong guess (as in Predict Not Taken and Predict Taken)

CIS429/529 Winter 2007 Pipelining-1 24 #4a: Delayed Branch - non-cancelling Fill the slot with a useful instruction penalty = 0 Sometimes no useful instruction can be found by the compiler penalty = 1

CIS429/529 Winter 2007 Pipelining-1 25 #4b: Delayed Branch - cancelling with predict taken Try to fill with a useful instruction: since we predict taken, it can be chosen from instructions at the taken location If the prediction was right DLX penalty = 0 If the prediction was wrong, the instruction must be cancelled DLX penalty = 1

CIS429/529 Winter 2007 Pipelining-1 26 #4b: Delayed Branch - cancelling with predict taken

CIS429/529 Winter 2007 Pipelining-1 27 Where to get instructions to fill the branch delay slot(s)? From before the branch instruction From the target address (only OK if branch taken) From the fall through (only OK if branch not taken) Compiler effectiveness for single branch delay slot: –about 60% of slots are filled –about 80% of instructions in delay slots are useful (not cancelled)

CIS429/529 Winter 2007 Pipelining-1 28 Where to get instructions for the delay slots (Fig. A.14)

CIS429/529 Winter 2007 Pipelining-1 29 Details of the MIPS datapath (Fig. A.17 - does not include pipeline HW)

CIS429/529 Winter 2007 Pipelining-1 30 MIPS datapath details (p. A27-A28) Instruction Fetch (IF) IR <-- Mem[PC] NPC <-- PC + 4 Instruction Decode/Register Fetch (ID) A <-- Regs[rs] B <-- Regs[rt] Imm <-- sign extended immediate field of IR Execution/Effective address (EX) Memory access/Branch completion (MEM) Write-back (WB)

CIS429/529 Winter 2007 Pipelining-1 31 MIPS pipeline details (p. A27-A28) Execution/Effective address (EX) If memory reference: ALUOutput <-- A + Imm If Reg-Reg ALU: ALUOutput <-- A func B If Reg-Imm ALU: ALUOutput <-- A op Imm If Branch: ALUOutput <-- NPC + (Imm << 2) Memory access/Branch completion (MEM) If memory reference: LMD <-- Mem[ALUOUtput] or Mem[ALUOutput] <-- B If Branch: if (cond) PC <-- ALUOutput Write-back (WB)

CIS429/529 Winter 2007 Pipelining-1 32 MIPS pipeline details (p. A27-A28) Write-back (WB) If Reg-Reg ALU: Regs[rd] <-- ALUOutput If Reg-Imm ALU: Regs[rt] <-- ALUOutput If Load: Regs[rt] <-- LMD

CIS429/529 Winter 2007 Pipelining-1 33 Details of the MIPS pipeline HW

CIS429/529 Winter 2007 Pipelining-1 34 MIPS pipeline events compared to MIPS (non-pipelined) datapath (Fig. A.19) Non-pipelined Instruction Fetch (IF) IR <-- Mem[PC] NPC <-- PC + 4 Pipelined Instruction Fetch (IF) IF/EX.IR <-- Mem[PC] IF/ID.NPC,PC <-- ( if ((EX/MEM.opcode == branch) & EX/MEM) {EX/MEM.ALUOutput} else {PC + 4} (

CIS429/529 Winter 2007 Pipelining-1 35 How HW detects data hazards (See Fig. A.21) LW R1, 0(R2) in EX stage ADD R3, R1, R4 in ID stage Check opcode field of ID/EX (bits 0-5): LOAD Check opcode field of IF/ID (bits 0-5): ALU Check operands for RAW: ID/EX.IR[rt] == IF/ID.IR[rs]

CIS429/529 Winter 2007 Pipelining-1 36 How HW accomplishes forwarding (See Fig. A.22 which show all 10 needed comparisons) Example of first comparison: PipelineReg. source: EX/MEM Opcode of source: Reg-Reg ALU PipelineReg. destination: ID/EX Opcode of destination: Reg-Reg ALU, ALU imm, load,store, branch Destination of forward: Top ALU input Comparison test for forward: EX/MEM.IR[rd] == ID/EX.IR[rs] etc. etc. etc.

CIS429/529 Winter 2007 Pipelining-1 37 Forwarding hardware

CIS429/529 Winter 2007 Pipelining-1 38 Pipeline HW has to deal with exceptions/interrupts: (Fig. A.27) Resume? Terminate? I/O interrupt OS kernel call Trace instruction execution Breakpoint Integer overflow FP overflow or underflow Page fault Misaligned memory access Memory protection violation Undefined instruction Hardware malfunction Power failure

CIS429/529 Winter 2007 Pipelining-1 39 Precise exceptions: Are supported by a pipelined system that stops the pipeline just before the faulting instruction completes and after the fault is handled, restarts the following instructions from scratch