Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.

Slides:



Advertisements
Similar presentations
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Advertisements

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
ELEN 468 Advanced Logic Design
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
Goal: Describe Pipelining
Computer Architecture
Chapter Six 1.
Instruction-Level Parallelism (ILP)
CIS429/529 Winter 2007 Pipelining-1 1 Pipeling RISC/MIPS64 five stage pipeline Basic pipeline performance Pipeline hazards Branch hazards More pipeline.
1 Lecture 17: Basic Pipelining Today’s topics:  5-stage pipeline  Hazards and instruction scheduling Mid-term exam stats:  Highest: 90, Mean: 58.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 18 - Pipelined.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 1.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan
L18 – Pipeline Issues 1 Comp 411 – Spring /03/08 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you.
CIS629 Fall 2002 Pipelining 2- 1 Control Hazards Created by branch statements BEQZLOC ADDR1,R2,R3. LOCSUBR1,R2,R3 PC needs to be computed but it happens.
L17 – Pipeline Issues 1 Comp 411 – Fall /1308 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you been.
Computer ArchitectureFall 2007 © October 3rd, 2007 Majd F. Sakr CS-447– Computer Architecture.
Computer Architecture Pipelines Diagrams are from Computer Architecture: A Quantitative Approach, 2nd, Hennessy and Patterson.
DLX Instruction Format
Appendix A Pipelining: Basic and Intermediate Concepts
1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 17 - Pipelined.
Pipelining Basics Assembly line concept An instruction is executed in multiple steps Multiple instructions overlap in execution A step in a pipeline is.
-1.1- PIPELINING 2 nd week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM PIPELINING 2 nd week References Pipelining concepts The DLX.
Pipeline Hazard CT101 – Computing Systems. Content Introduction to pipeline hazard Structural Hazard Data Hazard Control Hazard.
CSC 4250 Computer Architectures September 15, 2006 Appendix A. Pipelining.
COMP381 by M. Hamdi 1 Pipelining Improving Processor Performance with Pipelining.
Memory/Storage Architecture Lab Computer Architecture Pipelining Basics.
Lecture 5: Pipelining Implementation Kai Bu
Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.
EEL5708 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Pipelining.
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 8: MIPS Pipelined.
Pipelining Enhancing Performance. Datapath as Designed in Ch. 5 Consider execution of: lw $t1,100($t0) lw $t2,200($t0) lw $t3,300($t0) Datapath segments.
CMPE 421 Parallel Computer Architecture
Electrical and Computer Engineering University of Cyprus LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.
1 Pipelining Part I CS What is Pipelining? Like an Automobile Assembly Line for Instructions –Each step does a little job of processing the instruction.
Processor Design CT101 – Computing Systems. Content GPR processor – non pipeline implementation Pipeline GPR processor – pipeline implementation Performance.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
LECTURE 7 Pipelining. DATAPATH AND CONTROL We started with the single-cycle implementation, in which a single instruction is executed over a single cycle.
1. Convert the RISCEE 1 Architecture into a pipeline Architecture (like Figure 6.30) (showing the number data and control bits). 2. Build the control line.
11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.
Introduction to Computer Organization Pipelining.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
May 22, 2000Systems Architecture I1 Systems Architecture I (CS ) Lecture 14: A Simple Implementation of MIPS * Jeremy R. Johnson Mon. May 17, 2000.
EECS 322 April 10, 2000 EECS 322 Computer Architecture Pipeline Control, Data Hazards and Branch Hazards.
Morgan Kaufmann Publishers
ELEN 468 Advanced Logic Design
CMSC 611: Advanced Computer Architecture
Single Clock Datapath With Control
Pipeline Implementation (4.6)
CDA 3101 Spring 2016 Introduction to Computer Organization
School of Computing and Informatics Arizona State University
Chapter 3: Pipelining 순천향대학교 컴퓨터학부 이 상 정 Adapted from
Morgan Kaufmann Publishers The Processor
CMSC 611: Advanced Computer Architecture
Pipelining Chapter 6.
Systems Architecture II
An Introduction to pipelining
Systems Architecture I
Pipelining Appendix A and Chapter 3.
Morgan Kaufmann Publishers The Processor
Systems Architecture II
Pipelining Hazards.
Presentation transcript:

Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived from material in the text (Chap. 3). All figures from Computer Architecture: A Quantitative Approach, Second Edition, by John Hennessy and David Patterson, are copyrighted material (COPYRIGHT 1996 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED).

Oct. 18, 2000Machine Organization2 Introduction Objective: To understand pipelining and the enhanced performance it provides Pipelining is an implementation technique in which multiple instructions are overlapped in execution. Instructions are broken down into stages and while one instruction is executing one stage another instruction can simultaneously execute another stage. Topics –Review DLX –Simple Implementation of DLX –Basic Pipeline for DLX –Pipeline hazards –Floating point pipeline

Oct. 18, 2000Machine Organization3 Instruction Format R-type Instruction (register format - add, sub, …) I-type Instruction (immediate format - load, store, branch, immediate) J-type Instruction (jump, jal) op rs rt rd func op rs rt Immediate op offset added to PC

Oct. 18, 2000Machine Organization4 Implementation Stages Instruction Fetch Cycle (IF) –IR  Mem[PC] –NPC  PC + 4 Instruction Decode/Register Fetch Cycle (ID) –A  Regs[IR ] –B  Regs[IR ] –Imm  ((IR 16 ) 16 ## IR )

Oct. 18, 2000Machine Organization5 Implementation Stages Execution/Effective Address Cycle (EX) –Memory Reference: ALUOutput  A + Imm; –Register-Register ALU Instruction: ALUOutput  A func B; –Register-Immediate ALU Instruction: ALUOutput  A op Imm; –Branch: ALUOutput  NPC + Imm; Cond  (A op 0) ;

Oct. 18, 2000Machine Organization6 Implementation Stages Memory Access/Branch Completion Cycle (MEM) –Memory Reference: LMD  Mem[ALUOutput]; or Mem[ALUOutput]  B; –Branch: if (Cond) PC  ALUOutput; Write-back Cycle (WB) –Register-Register ALU Instruction: Regs[IR ]  ALUOutput; –Register-Immediate ALU Instruction: Regs[IR ]  ALUOutput; –Load Instruction: Regs[IR ]  LMD;

Oct. 18, 2000Machine Organization7 DLX Datapath

Oct. 18, 2000Machine Organization8 Simple DLX Pipeline Each stage (clock-cycle) becomes a pipeline stage Overlap execution of instructions Add registers between stages

Oct. 18, 2000Machine Organization9 Overlap of Functional Units

Oct. 18, 2000Machine Organization10 Pipelined Datapath

Oct. 18, 2000Machine Organization11 Pipeline Performance Expect speedup equal to the number of pipe stages –assumes equal sized tasks –no additional overhead due to pipelining Speedup from pipelining (reduce CPI or decrease clock) = Avg. inst. Ex. time unpipelined/ Avg. inst. Ex. Time pipelined Example: 10 ns clock without pipelining, 11 ns with pipelining (account for overhead). ALU (40%), Branch (20%) take 4 cycles, Memory (20%) takes 5. Speedup = 10 ns  ((.4 +.2)   5)/ 11 ns = 44/11 = 4

Oct. 18, 2000Machine Organization12 Pipeline Hazards Situations in pipelining when the next instruction cannot execute in the following clock cycle Structural hazards – hardware can not support the combination of instructions that we want to execute in the same cycle Control hazards –need to make a decision based on the results of one instruction while others are executing Data hazards –an instruction depends on a the results of a previous instruction still in the pipeline

Oct. 18, 2000Machine Organization13 Pipeline Performance II Must account for hazards –Hazards introduce stall cycles in the pipeline = Avg. inst. Ex. time unpipelined/ Avg. inst. Ex. Time pipelined = CPI unpipelined  Clock cycle unpipelined / CPI pipelined  Clock cycle pipelined = CPI unpipelined/(1 + Pipeline stall cycles per. Inst.)  Clock cycle unpipelined/Clock cycle pipelined  Pipeline Depth/(1 + Pipeline stall cycles per. Inst.)

Oct. 18, 2000Machine Organization14 Structural Hazards Problem: conflict in resources Example: Suppose that instruction and data memory was shared in single-cycle pipeline. Data access conflicts with instruction fetch Solution: remove conflicting stages, redesign resources to separate resources, or replicate resources

Oct. 18, 2000Machine Organization15 Structural Hazard

Oct. 18, 2000Machine Organization16 Data Hazards Problem: Instruction depends on the result of a previous instruction still in the pipeline Example: –add R1, R2, R3 –sub R5, R1, R4 Solutions: –forwarding or bypassing –instruction reordering to remove dependencies

Oct. 18, 2000Machine Organization17 Data Hazard Example –add R1, R2, R3 –sub R4, R1, R5 –and R6, R1, R7 –or R8, R1, R9 –xor R10, R1, R11

Oct. 18, 2000Machine Organization18 Data Dependencies

Oct. 18, 2000Machine Organization19 Data Forwarding

Oct. 18, 2000Machine Organization20 Implementing Forwarding Detection –e.g. EX/MEM.IR = ID/EX Use multiplexor to select forwarded results

Oct. 18, 2000Machine Organization21 Data Hazard with Stall –lw R1, 0(R2) –sub R4, R1, R5 –and R6, R1, R7 –or R8, R1, R9

Oct. 18, 2000Machine Organization22 Compiler Scheduling for Data Hazards Data hazards are naturally generated –C = A + B lw R1, A lw R2, B add R3, R1, R2 sw C, R3 Compiler can reorder instructions to remove dependencies –a = b + c; d = e - f; lw R1, b lw R2, c lw R3, e add R5, R1, R2 lw R4, f sw a, R5 sub R6, R3, R4 sw d, R6

Oct. 18, 2000Machine Organization23 Effectiveness of Scheduling

Oct. 18, 2000Machine Organization24 Control Hazards Problem: The next element to go into the pipe may depend on currently executing instruction or we may have to wait until a stage is completed to determine the next stage Example: branch instruction Solutions: –Stall - operate sequentially until decision can be made (wastes time) –Predict - guess what to do next. If guess correct, operate normally, if guess is wrong clear the pipe and begin again –Compute address of branch target earlier

Oct. 18, 2000Machine Organization25 Pipeline Stall for Branch Stall pipeline until MEM stage, which determines new PC Don’t stall until a branch is detected (ID) 3 cycles lost per branch is significant –30% branch frequency + ideal CPI = 1  machine with branch stalls only achieves 1/2 of ideal speedup

Oct. 18, 2000Machine Organization26 Computing the Taken PC Earlier Can detect branch condition (BEQZ, BNEZ) during ID Need extra adder to compute branch target during ID This reduces stall to one cycle

Oct. 18, 2000Machine Organization27 Compile Time Branch Prediction Assume either that the branch is taken or not taken Proceed under this assumption - if wrong “back out” and start over.

Oct. 18, 2000Machine Organization28 Delayed Branch Instruction after branch (branch delay slot) is executed no matter what the outcome of the branch is Requires that the instruction in the branch delay slot is safe to execute independent of branch Effectiveness depends on compiler

Oct. 18, 2000Machine Organization29 Designing Instruction Sets (MIPS) for Pipelining Want to break down instruction execution into a reasonable number of stages of roughly equal complexity All instructions the same length –easier to fetch and decode Few instruction formats (source register fields are located in the same place) –can begin reading registers at the same time instruction is decoded Memory operands appear only in loads and stores –calculate address during execute stage and access memory following stage - otherwise expand to addr stage, mem stage and ex stage Operands must be aligned in memory –don’t have to worry about a single data transfer instruction requireing two data memory accesses; hence, it requires a single pipeline stage