Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.

Slides:



Advertisements
Similar presentations
Pipeline Example: cycle 1 lw R10,9(R1) sub R11,R2, R3 and R12,R4, R5 or R13,R6, R7.
Advertisements

Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Pipelined Processor.
Review: MIPS Pipeline Data and Control Paths
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 18 - Pipelined.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan
Chapter Six Enhancing Performance with Pipelining
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
Computer ArchitectureFall 2007 © October 22nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
©UCB CS 162 Computer Architecture Lecture 2: Introduction & Pipelining Instructor: L.N. Bhuyan
Appendix A Pipelining: Basic and Intermediate Concepts
Computer ArchitectureFall 2008 © October 6th, 2008 Majd F. Sakr CS-447– Computer Architecture.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
1 Stalls and flushes  So far, we have discussed data hazards that can occur in pipelined CPUs if some instructions depend upon others that are still executing.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
Pipeline Data Hazards: Detection and Circumvention Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly.
Automobile Manufacturing 1. Build frame. 60 min. 2. Add engine. 50 min. 3. Build body. 80 min. 4. Paint. 40 min. 5. Finish.45 min. 275 min. Latency: Time.
EEL5708 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Pipelining.
1 A pipeline diagram  A pipeline diagram shows the execution of a series of instructions. —The instruction sequence is shown vertically, from top to bottom.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
Basic Pipelining & MIPS Pipelining Chapter 6 [Computer Organization and Design, © 2007 Patterson (UCB) & Hennessy (Stanford), & Slides Adapted from: Mary.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
Computer Organization CS224 Chapter 4 Part b The Processor Spring 2010 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture.
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
Electrical and Computer Engineering University of Cyprus LAB 2: MIPS.
CMPE 421 Parallel Computer Architecture Part 2: Hardware Solution: Forwarding.
CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
CSE431 L07 Overcoming Data Hazards.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 07: Overcoming Data Hazards Mary Jane Irwin (
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
1/24/ :00 PM 1 of 86 Pipelining Chapter 6. 1/24/ :00 PM 2 of 86 Overview of Pipelining Pipelining is an implementation technique in which.
CSIE30300 Computer Architecture Unit 05: Overcoming Data Hazards Hsin-Chou Chi [Adapted from material by and
CDA 3101 Summer 2003 Introduction to Computer Organization Pipeline Control And Pipeline Hazards 17 July 2003.
CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
CPE432 Chapter 4B.1Dr. W. Abu-Sufah, UJ Chapter 4B: The Processor, Part B-1 Read Sections 4.7 Adapted from Slides by Prof. Mary Jane Irwin, Penn State.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
CSE 340 Computer Architecture Spring 2016 Overcoming Data Hazards.
Electrical and Computer Engineering University of Cyprus
Computer Organization
Stalling delays the entire pipeline
Note how everything goes left to right, except …
CDA 3101 Spring 2016 Introduction to Computer Organization
Performance of Single-cycle Design
CMSC 611: Advanced Computer Architecture
Single Clock Datapath With Control
ECE232: Hardware Organization and Design
ECS 154B Computer Architecture II Spring 2009
ECE232: Hardware Organization and Design
Review: MIPS Pipeline Data and Control Paths
Chapter 4 The Processor Part 2
Single-cycle datapath, slightly rearranged
ELEC / Computer Architecture and Design Spring Pipelining (Chapter 6)
A pipeline diagram Clock cycle lw $t0, 4($sp) IF ID
The Processor Lecture 3.6: Control Hazards
An Introduction to pipelining
Pipelining Appendix A and Chapter 3.
Introduction to Computer Organization and Architecture
Pipelining - 1.
A relevant question Assuming you’ve got: One washer (takes 30 minutes)
Pipelined datapath and control
ELEC / Computer Architecture and Design Spring 2015 Pipeline Control and Performance (Chapter 6) Vishwani D. Agrawal James J. Danaher.
Presentation transcript:

Electrical and Computer Engineering University of Cyprus LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING

Previous Labs: MIPS single-cycle with Memory System 2

In this Lab: Pipelining You are expected to:  Understand the concept  Be familiar with the 5 MIPS pipeline stages  Understand the three pipeline hazards and their solutions. 3

PIPELINING Technique in which the execution of several instructions is overlapped. Each instruction is broken into several stages. Stages can operate concurrently PROVIDED WE HAVE SEPARATE RESOURCES FOR EACH STAGE! => each step uses a different functional unit. Note: execution time for a single instruction is NOT improved. Throughput of several instructions is improved. 4

Major Pipeline Benefit = Performance Ideal Performance  Time/instruction = non-piped-time/#stages  This is an asymptote of course, but +10% is commonly achieved Two ways to view the performance mechanism Reduced CPI (i.e. non-piped to piped change)  Close to 1 instruction/cycle if you’re lucky Reduced cycle-time (i.e. increasing pipeline depth)  Work split into more stages  Simpler stages result in faster clock cycles 5

Other Pipeline Benefits Completely hardware mechanism All modern machines are pipelined  This was the key technique to advancing performance in the 80’s  In the 90’s the move was to multiple pipelines Beware, no benefit is totally free/good 6

7 Laundry analogy to pipelining

Implementation of a RISC (Unpipelined, Multicycle) Implementation of an integer subset of a RISC architecture that takes at most 5 clock cycles.  Instruction Fetch (IF)  Instruction Decode/Register Fetch (ID)  Execution/Effective Address Calculation (EX)  Memory Access (MEM)  Write-Back (WB) 8

Review: Single-cycle Datapath for MIPS Data Memory (Dmem) PCRegisters ALU Instruction Memory (Imem) Stage 1Stage 2Stage 3 Stage 4 Stage 5 IFtchDcdExecMemWB Use datapath figure to represent pipeline ALU IM Reg DMReg 9

Classic 5 Stage Pipeline for a RISC Processor 10

Pipelined Execution Representation To simplify pipeline, every instruction takes same number of steps, called stages One clock cycle per stage IFtchDcdExecMemWB IFtchDcdExecMemWB IFtchDcdExecMemWB IFtchDcdExecMemWB IFtchDcdExecMemWB Program Flow Time 11

Important Pipeline Characteristics Latency Latency  Time it takes an instruction to go through the pipe  Latency = # stages * stage-delay  Dominant feature if there are a lot of exceptions… Throughput Throughput  Determined by the rate at which instructions can start/finish  Dominant feature if there are few exceptions 12

Pipelining Lessons Pipelining doesn’t help latency (execution time) of single task, it helps throughput of entire workload Multiple tasks operating simultaneously using different resources Potential speedup = Number of pipe stages Time to “fill” pipeline and time to “drain” it reduces speedup Pipeline rate limited by slowest pipeline stage Unbalanced lengths of pipe stages also reduces speedup 13

Single Cycle Datapath Regs Read Reg1 Read data1 ALUALU Read data2 Read Reg2 Write Reg Write Data Zero ALU- con RegWrite Address Read data Write Data Sign Extend Dmem MemRead MemWrite MuxMux MemTo- Reg MuxMux Read Addr Instruc- tion Imem 4 PCPC addadd addadd << 2 MuxMux PCSrc ALUOp ALU- src MuxMux 25:21 20:16 15:11 RegDst 15:0 31:0 14

Required Changes to Datapath Introduce registers to separate 5 stages by putting IF/ID, ID/EX, EX/MEM, and MEM/WB registers in the datapath. Next PC value is computed in the 3 rd step, but we need to bring in next instruction in the next cycle. Branch address is computed in 3 rd stage. With pipeline, the PC value has changed! Must carry the PC value along with instruction. 15

Changes to Datapath (cont.) For lw instruction, we need write register address at stage 5. But the IR is now occupied by another instruction! So, we must carry the IR destination field as we move along the stages. See connection in fig. 16

Pipelined Datapath (with Pipeline Regs) Address Add Add result Shift left 2 I n s t r u c t i o n M u x 0 1 Add PC 0 Address Write data M u x 1 Read data 1 Read data 2 Read register 1 Read register 2 16 Sign extend Write register Write data Read data 1 ALU result M u x ALU Zero Imem Dmem Regs IF/ID ID/EX EX/MEM MEM/WB 64 bits 133 bits 102 bits 69 bits 5 Fetch Decode Execute Memory Write Back 17

Pipelined Datapath (with Control Signals) 18

Control for Pipelined Datapath RegDst ALUOp[1:0] ALUSrc MemRead MemWrite Branch RegWrite MemtoReg  Basic approach: build on single-cycle control  Place control unit in ID stage  Pass control signals to following stages 19

Control for Pipelined Datapath 20

Datapath and Control Unit 21

Tracking Control Signals - Cycle 1 LW 22

Tracking Control Signals - Cycle 2 SWLW 23

Tracking Control Signals - Cycle 3 ADDSWLW

Tracking Control Signals - Cycle 4 SUBADD SW LW

1 1 ADD Tracking Control Signals - Cycle 5 SUB SW LW 26

Can pipelining get us into trouble? Yes: Pipeline Hazards Hazards occur because data required for executing the current instruction may not be available.  Structural hazards: attempt to use the same resource two different ways at the same time  Control hazards: attempt to make a decision before condition is evaluated branch instructions interrupts 27

Pipelining troubles (cont.) Data hazards occur when an instruction needs register contents for an arithmetic/ logical/memory instruction, before they are ready.  instruction depends on result of prior instruction still in the pipeline Can always resolve hazards by waiting:  pipeline control must detect the hazard  take action (or delay action) to resolve hazards 28

Hazard detection & Forwarding Units 29

Forwarding: A solution to Data Hazards 30

Stalls Forwarding will not always solve the problems of data hazards. For example, suppose an add instruction follows a load word (lw), and the add involves the register that receives the memory data. In this case, forwarding will not work. The reason is that the data must be read from memory, and so it will not be available until the end of the MEM cycle. Thus the required data is not available for a forward, and the add instruction. if it proceeds, will process the wrong data. A solution to this problem is the stall. A stall halts the instruction awaiting data, while the key instruction (a lw in this case) proceeds to the end of the MEM cycle, after which the desired data is available to the add. 31

Other Problems With Branches A remaining problem is what to do about instructions following a branch. Even assuming forwarding and stalls, the branch/no branch decision is not made until the third stage. This means that in the MIPS pipeline, two following instructions will enter the pipe before the branch/no branch decision is made. What if:  The following instructions were for the case of “branch taken” and the branch was not taken.  The following instructions were for “branch not taken” and it was taken. In either case, the wrong instructions are in the pipe and they must be eliminated (“flushed”). How can this problem be prevented? 32

Control Hazard Approach One approach is to always assume the branch is(or is not) taken: Say we assume the branch is never taken. Then if the instruction in ALU/EX is a branch, the instructions in IF and ID/RF will be those in the “not taken” program line (branch determination is made in ALU/EX).  It this assumption is correct, the pipeline will continue to flow without delay.  When the branch is taken, instructions in IF and ID/RF must be “flushed,” usually by changing the “op” code of those instructions to a “nop” and letting them proceed to the end of the pipe. 33

Summary Pipelining is a fundamental concept in computers/nature  Multiple instructions in flight  Limited by length of longest stage  Latency vs.Throughput Hazards gum up the works 34