Appendix C Pipeline implementation

Slides:



Advertisements
Similar presentations
Lecture 6: Pipelining MIPS R4000 and More Kai Bu
Advertisements

Instruction-Level Parallelism (ILP)
ECE 445 – Computer Organization
Review: MIPS Pipeline Data and Control Paths
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Chapter Six Enhancing Performance with Pipelining
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
Appendix A Pipelining: Basic and Intermediate Concepts
Lecture 28: Chapter 4 Today’s topic –Data Hazards –Forwarding 1.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
Pipeline Data Hazards: Detection and Circumvention Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly.
1 Appendix A Pipeline implementation Pipeline hazards, detection and forwarding Multiple-cycle operations MIPS R4000 CDA5155 Spring, 2007, Peir / University.
Pipelined Datapath and Control
CPE432 Chapter 4B.1Dr. W. Abu-Sufah, UJ Chapter 4B: The Processor, Part B-2 Read Section 4.7 Adapted from Slides by Prof. Mary Jane Irwin, Penn State University.
Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.
Basic Pipelining & MIPS Pipelining Chapter 6 [Computer Organization and Design, © 2007 Patterson (UCB) & Hennessy (Stanford), & Slides Adapted from: Mary.
CMPE 421 Parallel Computer Architecture Part 2: Hardware Solution: Forwarding.
CSE431 L07 Overcoming Data Hazards.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 07: Overcoming Data Hazards Mary Jane Irwin (
CSIE30300 Computer Architecture Unit 05: Overcoming Data Hazards Hsin-Chou Chi [Adapted from material by and
CMPE 421 Parallel Computer Architecture Part 3: Hardware Solution: Control Hazard and Prediction.
PROCESSOR PIPELINING YASSER MOHAMMAD. SINGLE DATAPATH DESIGN.
CPE432 Chapter 4B.1Dr. W. Abu-Sufah, UJ Chapter 4B: The Processor, Part B-1 Read Sections 4.7 Adapted from Slides by Prof. Mary Jane Irwin, Penn State.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
CSE 340 Computer Architecture Spring 2016 Overcoming Data Hazards.
Pipeline Timing Issues
Stalling delays the entire pipeline
Note how everything goes left to right, except …
CDA 3101 Spring 2016 Introduction to Computer Organization
Pipelining Chapter 6.
Lecture 07: Pipelining Multicycle, MIPS R4000, and More
Single Clock Datapath With Control
Chapter 4 The Processor Part 4
ECS 154B Computer Architecture II Spring 2009
ECS 154B Computer Architecture II Spring 2009
\course\cpeg323-08F\Topic6b-323
ECE232: Hardware Organization and Design
Pipelining: Advanced ILP
Forwarding Now, we’ll introduce some problems that data hazards can cause for our pipelined processor, and show how to handle them with forwarding.
Chapter 4 The Processor Part 3
Review: MIPS Pipeline Data and Control Paths
Morgan Kaufmann Publishers The Processor
Morgan Kaufmann Publishers The Processor
Pipelining Multicycle, MIPS R4000, and More
Pipelining review.
Single-cycle datapath, slightly rearranged
Pipelining Chapter 6.
Computer Organization CS224
Pipelining in more detail
CSC 4250 Computer Architectures
\course\cpeg323-05F\Topic6b-323
Pipeline control unit (highly abstracted)
The Processor Lecture 3.6: Control Hazards
The Processor Lecture 3.5: Data Hazards
Instruction Execution Cycle
Pipeline control unit (highly abstracted)
Lecture 4: Advanced Pipelines
Pipeline Control unit (highly abstracted)
Pipelining (II).
Control unit extension for data hazards
Morgan Kaufmann Publishers The Processor
Introduction to Computer Organization and Architecture
Pipelining Chapter 6.
Control unit extension for data hazards
Pipelining - 1.
Stalls and flushes Last time, we discussed data hazards that can occur in pipelined CPUs if some instructions depend upon others that are still executing.
MIPS Pipelined Datapath
©2003 Craig Zilles (derived from slides by Howard Huang)
Pipelining Hazards.
Presentation transcript:

Appendix C Pipeline implementation Pipeline hazards, detection and forwarding Multiple-cycle operations MIPS R4000 CDA5155 Fall 2014, Peir / University of Florida

Limits of Pipelining Increasing the number of pipeline stages in a given logic block by a factor of n generally allows increasing clock speed & throughput by a factor of almost n. Usually less than n because of overheads such as latches and balance of delay in each stage. But, pipelining has a natural limit: At least 1 layer of logic gates per pipeline stage! Practical minimum is usally several gates (2-10). Commercial designs are approaching this point!!

Simple RISC Datapath

Basic RISC Pipelining Basic idea: Each instruction spends 1 clock cycle in each of the 5 execution stages. During 1 clock cycle, the pipeline can be processing (different stages of) 5 different instructions.

Adding Pipeline Registers

Pipeline Hazards Hazards are circumstances which may lead to stalls (delays, “bubbles”) in the pipeline if not addressed. Three major types: Structural hazards: Lack of HW resources to keep all instructions moving. Data hazards Data results of earlier instrs. not yet avail. when needed. Control hazards Control decisions resulting from earlier instrs. (branches) not yet made; don’t know which new instrs. to execute.

Structural Hazard Example Suppose you had a combined instruction+data memory with only 1 read port

Hazards Produce “Bubbles”

Another View

Example Data Hazard

Forwarding for Data Hazards

Another Forwarding Example

Three Types of Data Hazards Let i be an earlier instruction, j a later one. RAW (read after write) j tries to read a value before i writes it WAW (write after write) i and j write to same place, but in the wrong order. Only occurs if >1 pipeline stage can write. WAR (write after read) j writes a new value to a location before i has read the old one. Only occurs if writes can happen before reads in pipeline.

An Unavoidable Stall - Load

Stalling for Load Dependent

Data Hazard Prevention A clever compiler can often reschedule instructions (code motion) to avoid a stall. A simple example: Original code: lw r2, 0(r4) add r1, r2, r3  Note: Stall happens here! lw r5, 4(r4) Transformed code: lw r2, 0(r4) lw r5, 4(r4) add r1, r2, r3  No stall needed!

MIPS Instruction Format

5-Stage Pipeline

Operations of Pipe Stages

Data Hazard Detection

Hazard Detection Logic for Load NOTE, The right part of the equ. should be IF/ID.IR (Fig. C.25) Example: Detecting whether an instruction that has just been fetched needs to be stalled because of dependence from a preceding load.

Forwarding Situations in MIPS Same as Figure C.26

Forwarding to The ALU Provide multiple path to the input of the ALU

Datapath with Forwarding Hardware PCSrc Read Address Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Data 1 Data 2 16 32 ALU Shift left 2 Data IF/ID Sign Extend ID/EX EX/MEM MEM/WB Control cntrl Branch Forward Unit For lecture. How many bits wide is each pipeline register now? ID/EX – 9 + 32x4 + 10 = 147 + 10 = 157 Control line inputs to Forward Unit EX/MEM.RegWrite and MEM/WB.RegWrite not shown on diagram EX/MEM.RegisterRd MEM/WB.RegisterRd ID/EX.RegisterRt ID/EX.RegisterRs

Adding the Hazard Hardware PCSrc ID/EX.MemRead Hazard Unit ID/EX ID/EX.RegisterRt EX/MEM IF/ID 1 Control Add MEM/WB Branch Add 4 Shift left 2 Instruction Memory Read Addr 1 Data Memory Register File Read Data 1 Read Addr 2 Read Address PC Read Data Address Write Addr ALU Read Data 2 Write Data Write Data ALU cntrl For lecture In reality, only the signals RegWrite and MemWrite need to be 0, the other control signals can be don’t cares. Another consideration is energy – where clock gating is called for. 16 32 Sign Extend Forward Unit

Branch Hazard Suppose the new PC value is not computed until the MEM stage. Then we must stall 3 clocks after every branch!

Early Branch Resolution Branch resolution at ID stage

Predict-Not-Taken (Branch resolves in ID) Same as Fig. C.12

Branch is taken (if taken) at this point Delayed Branches Machine code sequence: Branch instruction Delay slot instruction(s) Post-branch instructions Branch is taken (if taken) at this point Same as Fig. C.13

Filling the Branch-Delay Slot For (b), (c) must no side-effect!  Note, dynamic branch prediction will be covered in Chap. 3

Multi-Cycle Execution Figure C.33 The MIPS pipeline with three additional unpipelined, floating-point, functional units.

Latency & Initiation Interval Extra delay cycles before result is available. Initiation interval: Minimum number of cycles before a new input can be given to that functional unit.

Pipelined Multiple-FP Operations Figure C.35 A pipeline that supports multiple outstanding FP operations.

Pipelining FP Instructions Notice instructions may complete out-of-order: MULTD IF ID M1 M2 M3 M4 M5 M6 M7 ME WB ADDD IF ID A1 A2 A3 A4 ME WB LD IF ID EX ME WB SD IF ID EX ME WB Raises the possibility of WAW hazards, and structural hazards in MEM & WB stages. Structural hazards may occur especially often with non-pipelined DIV unit. Out-of-order completion impacts exception handling.

Issues in Multi-Cycle Operations Stall for RAW is longer and more frequent (Fig. C.37) WAW is possible; WAR is not (why?) Structural Hazard possible for non-pipelined unit Multiple WBs are likely (Fig. C.38) Handling hazards At Issue (ID) stage: Check structural hazards: functional unit, WB port Check RAW hazards: Issue with forwarding Check WAW hazards: Not issue to make sure write in order Detect and stall instruction before MEM and WB stages More uniform handling given in Chapter 3.

Maintaining Precise Exception Settle for imprecise exception Buffer and complete in order Require large buffers and comparators History file, future file approaches Software trap handling when exception occurs Hybrid scheme: Issue when certain no exception for early instruction All instructions before can be completed No instructions after can be completed

Real MIPS R4000 Pipeline IF,IS - Instruction cache fetch, First & Second halves. RF - Inst. decode, Register Fetch, hazard check… EX - Execution (EA calc, ALU op, target calc…) DF,DS - Data cache access, First & Second halves. TC - Tag Check, did cache access hit? Note, use data before resolving hit/miss. WB - Write-Back for loads & register-register ops.  Read through C.43 – C.51

2-Cycle Load Delay

Branch Delay