CSC 4250 Computer Architectures September 26, 2006 Appendix A. Pipelining.

Slides:

Advertisements

Similar presentations

Tor Aamodt EECE 476: Computer Architecture Slide Set #6: Multicycle Operations.

Advertisements

Advanced Computer Architectures Laboratory on DLX Pipelining Vittorio Zaccaria.

COMP 4211 Seminar Presentation Based On: Computer Architecture A Quantitative Approach by Hennessey and Patterson Presenter : Feri Danes.

Instruction Set Issues MIPS easy –Instructions are only committed at MEM  WB transition Other architectures are more difficult –Instructions may update.

Lecture 6: Pipelining MIPS R4000 and More Kai Bu

1 IF IDEX MEM L.D F4,0(R2) MUL.D F0, F4, F6 ADD.D F2, F0, F8 L.D F2, 0(R2) WB IF IDM1 MEM WBM2M3M4M5M6M7 stall.

Computer Architecture

1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)

1 Lecture 5 Branch Prediction (2.3) and Scoreboarding (A.7)

1 Lecture 5: Pipeline Wrap-up, Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2) Assignment 1 due at the start of class on Thursday.

Review of CS 203A Laxmi Narayan Bhuyan Lecture2.

EECC551 - Shaaban #1 Fall 2002 lec# Floating Point/Multicycle Pipelining in MIPS Completion of MIPS EX stage floating point arithmetic operations.

COMP381 by M. Hamdi 1 Pipelining Control Hazards and Deeper pipelines.

COMP381 by M. Hamdi 1 Pipelining (Dynamic Scheduling Through Hardware Schemes)

1 Recap (Scoreboarding). 2 Dynamic Scheduling Dynamic Scheduling by Hardware – – Allow Out-of-order execution, Out-of-order completion – – Even though.

DLX Instruction Format

1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Oct 5, 2005 Topic: Instruction-Level Parallelism (Dynamic Scheduling: Scoreboarding)

1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)

1 Lecture 4: Advanced Pipelines Control hazards, multi-cycle in-order pipelines, static ILP (Appendix A.4-A.10, Sections )

Appendix A Pipelining: Basic and Intermediate Concepts

EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.

ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.

CSC 4250 Computer Architectures October 13, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation.

Computer Architecture

Pipeline Hazard CT101 – Computing Systems. Content Introduction to pipeline hazard Structural Hazard Data Hazard Control Hazard.

Lecture 7: Pipelining Review Kai Bu

CPE 731 Advanced Computer Architecture Pipelining Review Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of California,

Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.

1 Appendix A Pipeline implementation Pipeline hazards, detection and forwarding Multiple-cycle operations MIPS R4000 CDA5155 Spring, 2007, Peir / University.

Lecture 5: Pipelining & Instruction Level Parallelism Professor Alvin R. Lebeck Computer Science 220 Fall 2001.

Pipeline Extensions prepared and Instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University MIPS Extensions1May 2015.

Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.

1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.

1 Copyright © 2011, Elsevier Inc. All rights Reserved. Appendix C Authors: John Hennessy & David Patterson.

1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

CSC 4250 Computer Architectures September 29, 2006 Appendix A. Pipelining.

Appendix A Pipelining:Basic and Intermediate Concepts

Recap Multicycle Operations –MIPS Floating Point Putting It All Together: the MIPS R4000 Pipeline.

EE524/CptS561 Jose G. Delgado-Frias 1 Processor Basic steps to process an instruction IFID/OFEXMEMWB Instruction Fetch Instruction Decode / Operand Fetch.

11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.

LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,

CSC 4250 Computer Architectures September 22, 2006 Appendix A. Pipelining.

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 1 Scoreboarding  The following four steps replace ID, EX and WB steps  ID: Issue –

CS203 – Advanced Computer Architecture Pipelining Review.

Instruction-Level Parallelism and Its Dynamic Exploitation

Instruction-Level Parallelism

Images from Patterson-Hennessy Book

Lecture 07: Pipelining Multicycle, MIPS R4000, and More

Out of Order Processors

Lecture: Pipelining Extensions

Pipelining Wrapup Brief overview of the rest of chapter 3

Single Clock Datapath With Control

Appendix C Pipeline implementation

Pipelining: Advanced ILP

CS 5513 Computer Architecture Pipelining Examples

Lecture 6: Advanced Pipelines

Pipelining Multicycle, MIPS R4000, and More

Pipelining Chapter 6.

Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2

CSC 4250 Computer Architectures

CS 704 Advanced Computer Architecture

How to improve (decrease) CPI

Instruction Execution Cycle

Overview What are pipeline hazards? Types of hazards

Pipelining Multicycle, MIPS R4000, and More

Extending simple pipeline to multiple pipes

Lecture 4: Advanced Pipelines

CMSC 611: Advanced Computer Architecture

Lecture 5: Pipeline Wrap-up, Static ILP

CS 3853 Computer Architecture Pipelining Examples

Presentation transcript:

CSC 4250 Computer Architectures September 26, 2006 Appendix A. Pipelining

Checks before Instruction Issue Check for structural hazards ─ FP divide, register write port Check for RAW hazard ─ check if source registers of instruction in ID is listed as a destination in ID/A1, A1/A2, A2/A3, ID/M1, M1/M2, M2/M3, …, M5/M6, D Check for WAW hazard ─ check if instruction in ID has the same register destination as any instruction in A1, …, A4, M1, …, M7, D Have we forgotten about WAR hazard?

Example on why no WAR hazards Modify Figure A.33: 1. L.DF4,O(R2) 2. MUL.DF0,F4,F6 3. ADD.DF4,F0,F8 4. S.DF4,O(R2) Clock cycle Number In IF ID EX ME WB 2. IF 3. IF 4. IF Fill in the blanks

How Imprecise Exception May Arise Consider code: DIV.DF0,F2,F4 ADD.DF10,F10,F8 SUB.DF12,F12,F14 Will get out-of-order completion Suppose SUB.D causes a FP exception after ADD.D completes (but before DIV.D finishes) Next, DIV.D causes a FP exception Cannot restore the state to before DIV.D, as ADD.D has destroyed one of its operands

Performance of MIPS FP Pipeline The MIPS FP pipeline generates both structural stalls for the divide unit and stalls for RAW hazards (it can also have WAW hazards, but this rarely occurs in practice). Figure A.35 shows the number of stall cycles for each type of FP operation. The stall cycles per operation track the latency of the FP operations, varying from 46% to 59% of the latency of the FU.

Fig. A.35. Stalls per FP operation for FP SPEC89

Stalls for SPEC89 FP Benchmarks Consider data in Figure A.35 Except for the divide structural hazards, these data do not depend on the frequency of an operation, only on its latency and the # of cycles before the result is used The number of stalls from RAW hazards roughly tracks the latency of the FP unit. For example, the average number of stalls per FP add, subtract, or convert is 1.7 cycles, or 56% of the latency (3 cycles). Likewise, the average number of stalls for multiplies and divides are 2.8 and 14.2, resp., or 46% and 59% of the corr. latency Structural hazards for divides are rare, since the divide frequency is low

Fig. A.36. Stalls per instr. for FP SPEC89

MIPS R4000 Pipeline Implements MIPS64 but uses a deeper pipeline Achieve higher clock rates by decomposing five- stage integer pipeline into eight stages Extra pipeline stages from decomposing memory access Sometimes called superpipelining

Eight-stage Pipeline Structure of R4000 It uses pipelined instruction and data caches

Functions of the Eight Stages (1) 1. IF─  First half of instruction fetch  PC selection  Initiation of instruction cache access 2. IS─  Second half of instruction fetch  Complete instruction cache access 3. RF─  Instruction decode and register fetch  Hazard check  Instruction cache hit detection 4. EX─  Execution (including effective address calculation, ALU operation, branch-target calc. and condition evaluation.)

Functions of the Eight Stages (2) 5. DF─Data fetch, 1st half of data cache access 6. DS─2nd half of data fetch, Complete data cache access 7. TC─Tag check, Determine if data cache access is hit 8. WB─Write back for loads and RR operations

Two-cycle Load Delay for R4000

Example of Load 2-cycle Stall Clock number Instruction no LDR1,…IF IS RF EX DF DS TC WB DADDR2,R1,… IF IS RF st st EX DF DS DSUBR3,R1,… IF IS st st RF EX DF ORR4,R1IF st st IS RF EX

Three-cycle Branch Delay for R4000  Evaluate branch condition during EX

Example of Taken Branch Clock number Instruction no Branch instructionIF IS RF EX DF DS TC WB Delay slot IF IS RF EX DF DS TC WB Stall st st st st st st st Stall st st st st st st Branch instruction IF IS RF EX DF

Example of Untaken Branch Clock number Instruction no Branch instructionIF IS RF EX DF DS TC WB Delay slot IF IS RF EX DF DS TC WB Branch instruction+2 IF IS RF EX DF DS TC Branch instruction+3 IF IS RF EX DF DS R4000 uses a predicted-not-taken strategy for the remaining two cycles of the branch delay. Advantage over predicted-taken strategy?

R4000 FP Pipeline FP instructionLatencyInitiation interval Add, subtract43 Multiply84 Divide3635 Square root Negate21 Absolute value21 FP compare32

Major Causes of R4000 Pipeline Stalls Load stalls ─ use of a load result 1 or 2 cycles after the load Branch stalls ─ 2-cycle stall on taken branch plus unfilled or cancelled branch delay slots FP result stalls ─ RAW hazards for operand FP structural stalls ─ conflicts for func. units WAW stalls are not common

Pipeline CPI for 10 SPEC92 Benchmarks The pipeline CPI varies from 1.2 to 2.8. The left five programs are integer programs, and branch delays are the major CPI contributor. The right five programs are FP, and FP result stalls are the major contributors.

Pipeline CPI and Major Sources of Stalls BenchmarkCPILoadBranchResultStructure Gcc Int. average Hydro2d FP average Overall average

MIPS R4300 Pipeline Manufactured by NEC 64 bit processor implements MIPS64 IS Embedded applications Nintendo-64 game processor High-end color laser printer Multiple EX stages for FP operations Instructions complete out of order FP instruction generates exception after a following integer instruction has completed, leading to an imprecise exception