CS 5513 Computer Architecture Pipelining Examples

Slides:



Advertisements
Similar presentations
Tor Aamodt EECE 476: Computer Architecture Slide Set #6: Multicycle Operations.
Advertisements

1 ECE369 ECE369 Pipelining. 2 ECE369 addm (rs), rt # Memory[R[rs]] = R[rt] + Memory[R[rs]]; Assume that we can read and write the memory in the same cycle.
Pipeline Summary Try to put everything together for pipelines Before going onto caches. Peer Instruction Lecture Materials for Computer Architecture by.
Pipeline Exceptions & ControlCSCE430/830 Pipelining in MIPS MIPS architecture was designed to be pipelined –Simple instruction format (makes IF, ID easy)
Advanced Computer Architectures Laboratory on DLX Pipelining Vittorio Zaccaria.
COMP4611 Tutorial 6 Instruction Level Parallelism
COMP 4211 Seminar Presentation Based On: Computer Architecture A Quantitative Approach by Hennessey and Patterson Presenter : Feri Danes.
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 7: Dynamic Scheduling and Branch Prediction * Jeremy R. Johnson Wed. Nov. 8, 2000.
Instruction Set Issues MIPS easy –Instructions are only committed at MEM  WB transition Other architectures are more difficult –Instructions may update.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
Pipelining Two forms of pipelining
Lecture 6: Pipelining MIPS R4000 and More Kai Bu
Instruction-Level Parallelism (ILP)
1 IF IDEX MEM L.D F4,0(R2) MUL.D F0, F4, F6 ADD.D F2, F0, F8 L.D F2, 0(R2) WB IF IDM1 MEM WBM2M3M4M5M6M7 stall.
COMP381 by M. Hamdi 1 Pipeline Hazards. COMP381 by M. Hamdi 2 Pipeline Hazards Hazards are situations in pipelining where one instruction cannot immediately.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 3 (and Appendix C) Instruction-Level Parallelism and Its Exploitation Computer Architecture.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
COMP381 by M. Hamdi 1 Superscalar Processors. COMP381 by M. Hamdi 2 Recall from Pipelining Pipeline CPI = Ideal pipeline CPI + Structural Stalls + Data.
Pipelining Andreas Klappenecker CPSC321 Computer Architecture.
Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
CSCE 212 Quiz 9 – 3/30/11 1.What is the clock cycle time based on for single-cycle and for pipelining? 2.What two actions can be done to resolve data hazards?
EECC551 - Shaaban #1 Fall 2002 lec# Floating Point/Multicycle Pipelining in MIPS Completion of MIPS EX stage floating point arithmetic operations.
Computer ArchitectureFall 2007 © October 31, CS-447– Computer Architecture M,W 10-11:20am Lecture 17 Review.
Appendix A Pipelining: Basic and Intermediate Concepts
1 Manchester Mark I, This was the second (the first was a small- scale prototype) machine built at Cambridge. A production version of this computer.
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
Memory/Storage Architecture Lab Computer Architecture Pipelining Basics.
1 Appendix A Pipeline implementation Pipeline hazards, detection and forwarding Multiple-cycle operations MIPS R4000 CDA5155 Spring, 2007, Peir / University.
Pipeline Extensions prepared and Instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University MIPS Extensions1May 2015.
CIS 662 – Computer Architecture – Fall Class 16 – 11/09/04 1 Compiler Techniques for ILP  So far we have explored dynamic hardware techniques for.
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 8: MIPS Pipelined.
Computer Organization CS224 Fall 2012 Lesson 28. Pipelining Analogy  Pipelined laundry: overlapping execution l Parallelism improves performance §4.5.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S
Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.
CBP 2005Comp 3070 Computer Architecture1 Last Time … All instructions the same length We learned to program MIPS And a bit about Intel’s x86 Instructions.
Instruction-Level Parallelism
CS2100 Computer Organization
CDA3101 Recitation Section 8
Pipelining Chapter 6.
/ Computer Architecture and Design
Lecture 07: Pipelining Multicycle, MIPS R4000, and More
Pipelining Wrapup Brief overview of the rest of chapter 3
Single Clock Datapath With Control
Pipeline Implementation (4.6)
Appendix C Pipeline implementation
CS203 – Advanced Computer Architecture
Chapter 4 The Processor Part 3
Morgan Kaufmann Publishers The Processor
Lecture 6: Advanced Pipelines
Pipelining Multicycle, MIPS R4000, and More
Pipelining review.
Lecture 5: Pipelining Basics
Pipelining in more detail
CSCI206 - Computer Organization & Programming
CSC 4250 Computer Architectures
CS 704 Advanced Computer Architecture
Computer Architecture
Pipelining Multicycle, MIPS R4000, and More
Pipeline Hazards
CS203 – Advanced Computer Architecture
Pipelining: Basic Concepts
Lecture 4: Advanced Pipelines
Pipelining Chapter 6.
CMSC 611: Advanced Computer Architecture
CS 3853 Computer Architecture Pipelining Examples
Conceptual execution on a processor which exploits ILP
Problem ??: (?? marks) Consider executing the following code on the MIPS pipelined datapath: add $t5, $t6, $t8 add $t9, $t5, $t4 lw $t3, 100($t9) sub $t2,
Presentation transcript:

CS 5513 Computer Architecture Pipelining Examples CS252 S05

Data Hazard with Stalls (1/2) Consider the following code: DADD R1,R3,R3 DSUB R4,R1,R5 AND R6,R1,R7 OR R8,R1,R9 XOR R10,R1,R11 Let’s diagram the execution of this code

Data Hazards with Stalls (2/2) The ID stage in cycle 3 stalls up to cycle 5 so it can read R1 The IF stage in cycle 3 stalls until cycle 5 because ID can’t start for the DSUB until it is finished for the DADD By this time, R1 is available for subsequent instructions in their ID stages. 11 cycles total

Data Hazards with Forwarding The EX stage in cycle 3 forwards to the EX stage in cycle 4 The MM stage in cycle 4 forwards to the EX stage in cycle 5 The WB stage in cycle 5 “forwards” to the EX stage in cycle 6 9 cycles total

Another Example (1/2) Without forwarding DSUB stalls ID in cycles 4 and 5 waiting for R1 to be written back AND and OR must stall as well 10 cycles total

Another Example (2/2) With forwarding A stall is still needed because the EX stage for DSUB will need the result of the MEM stage for LD 9 cycles total

Multi-cycle latency Until now, all instructions have 1 cycle latency In the presence of floating point or slow memory, some instructions will take longer than others Multi-cycle instructions have: An Initiation Interval: how long we must wait before starting another instruction with the same functional unit. A latency: how many extra cycles this instruction takes For the MIPS FP pipeline: Multiplication has an initiation interval of 1 and a latency of 6. FP addition has an initiation interval of 1 and a latency of 3.

Example: Multi-cycle latency MUL.D stalls in ID waiting for the forwarded result from the L.D MUL.D starts executing in cycle 5 and takes 6 extra cycles ADD.D stalls waiting for the forwarded result from MUL.D ADD.D computes its result in 1+3=4 cycles S.D stalls waiting for the result from ADD.D 18 cycles total

Strategies for Handling Branches Execute branches in decode A good idea regardless of other ways of handling branches Stall until branch is resolved Simple and slow Predict branch taken Most backward branches are taken Predict branch not taken Most forward branches are not taken

Example: Branch with Stall (1/2) Consider the following code: Loop: LD R6,0(R2) DADDI R2,R2,#4 SD R6,8(R2) DSUB R4,R2,R3 BNZ R4,Loop Assume R3 = R2 + 100, so the loop iterates 25 times

Example: Branch with Stall (2/2) Execute branch in decode stage From one branch fetch to the next, there are 7 cycles. So loop takes 7(25)=175 cycles. Add another 5 cycles after the last fetch = 180 cycles