Pipelining.

Slides:

Advertisements

Similar presentations

Advertisements

Lecture 4: CPU Performance

PIPELINING AND VECTOR PROCESSING

PIPELINE AND VECTOR PROCESSING

Pipeline Example: cycle 1 lw R10,9(R1) sub R11,R2, R3 and R12,R4, R5 or R13,R6, R7.

Computer Architecture Lecture 2 Abhinav Agarwal Veeramani V.

Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.

ELEN 468 Advanced Logic Design

CMPT 334 Computer Organization

Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University RISC Pipeline See: P&H Chapter 4.6.

© Kavita Bala, Computer Science, Cornell University Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipelining See: P&H Chapter 4.5.

1 Recap (Pipelining). 2 What is Pipelining? A way of speeding up execution of tasks Key idea : overlap execution of multiple taks.

CSCE 212 Quiz 9 – 3/30/11 1.What is the clock cycle time based on for single-cycle and for pipelining? 2.What two actions can be done to resolve data hazards?

Computer ArchitectureFall 2007 © October 31, CS-447– Computer Architecture M,W 10-11:20am Lecture 17 Review.

Appendix A Pipelining: Basic and Intermediate Concepts

5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2.

Lecture 8: Processors, Introduction EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014,

Lecture 14: Processors CS 2011 Fall 2014, Dr. Rozier.

1 COMP541 Multicycle MIPS Montek Singh Apr 4, 2012.

Morgan Kaufmann Publishers

Pipelining (I). Pipelining Example  Laundry Example  Four students have one load of clothes each to wash, dry, fold, and put away  Washer takes 30.

Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.

Pipeline Hazards. CS5513 Fall Pipeline Hazards Situations that prevent the next instructions in the instruction stream from executing during its.

Fetch-execute cycle.

Computer Architecture Lecture 10 MIPS Control Unit Ralph Grishman Oct NYU.

Branch Hazards and Static Branch Prediction Techniques

Pipelining Example Laundry Example: Three Stages

EE524/CptS561 Jose G. Delgado-Frias 1 Processor Basic steps to process an instruction IFID/OFEXMEMWB Instruction Fetch Instruction Decode / Operand Fetch.

11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.

10/11: Lecture Topics Execution cycle Introduction to pipelining

Introduction to Computer Organization Pipelining.

Real-World Pipelines Idea Divide process into independent stages

Computer Organization

Pipelines An overview of pipelining

CS 286 Computer Architecture & Organization

CSCI206 - Computer Organization & Programming

Morgan Kaufmann Publishers

Lecture 07: Pipelining Multicycle, MIPS R4000, and More

ELEN 468 Advanced Logic Design

Morgan Kaufmann Publishers The Processor

Single Clock Datapath With Control

Pipeline Implementation (4.6)

CDA 3101 Spring 2016 Introduction to Computer Organization

ECE232: Hardware Organization and Design

Design of the Control Unit for Single-Cycle Instruction Execution

Pipelining: Advanced ILP

Morgan Kaufmann Publishers The Processor

Pipelining and Vector Processing

Pipelining Multicycle, MIPS R4000, and More

Pipelining review.

Design of the Control Unit for One-cycle Instruction Execution

Serial versus Pipelined Execution

Pipelining in more detail

CSC 4250 Computer Architectures

CSCI206 - Computer Organization & Programming

Rocky K. C. Chang 6 November 2017

Data Hazards Data Hazard

Control unit extension for data hazards

An Introduction to pipelining

Pipelining: Basic Concepts

COMPUTER ARCHITECTURES FOR PARALLEL ROCESSING

Control unit extension for data hazards

Pipelining Appendix A and Chapter 3.

Control unit extension for data hazards

Guest Lecturer: Justin Hsia

Lecture: Pipelining Basics

MIPS Pipelined Datapath

Presentation transcript:

Pipelining

Pipelining s1 s2 s3 Without pipeline With pipeline stages stages s3 s3 time time Without pipeline With pipeline

Pipelining Without pipeline With pipeline T1 = s . t . n stages stages Without pipeline With pipeline s3 s3 s2 s2 s1 s1 time time T1 = s . t . n Ts = s . t + (n-1).t Speedup = T1 / Ts = s.n = s s+(n-1) s/n +(1-1/n) Speedup = s n s – stages n – tasks t – time per stage Throughput = n Ts

Pipelining Without pipeline With pipeline T1 = s . t . n stages stages Without pipeline With pipeline s3 s3 s2 s2 s1 s1 T1 = s . t . n Ts = s . t + (n-1).t s = 3 n T1 Ts Speedup Throughput 1 3t 1/3t 10 30t 3t+9t = 12t 30/12 = 2.5 10/12t 100 300t 3t+99t = 102t 300/102 = 2.9 100/102t 1000000 3000000t 3t+999999t = 1000002t = 2.999994  1/t Speedup = T1 / Ts Speedup = s n Throughput = n Ts

Pipelining Slowest stage determines the pipeline performance s1 s2 s3 10 30 20 stages stages s3 s3 s2 s2 s1 s1 time time Without pipeline With pipeline Slowest stage determines the pipeline performance

Pipelining Deep pipeline s1 s2 s3 3 stages 6 stages s1 s21 s22 s23 s31 10 30 20 s1 s21 s22 10 10 10 10 10 10 s23 s31 s32 stages stages s1 s2 s3 s4 s5 s6 s1 s2 s3 time time 3 stages 6 stages Deep pipeline

Computational Pipelines Combinatorial logic Reg clock R R R Comb.log. A Comb.log. B Comb.log. C clock

Limitations of Pipelining Nonuniform partitioning Stage delays may be nonuniform Throughput is limited by the slowest stage Deep pipelining Large number of stages Modern processors have deep pipelines (15 or more) to increase the clock rate. 50ps 20ps 150ps 20ps 100ps 20ps Comb.log. A R B C clock 50ps 20ps 50ps 20ps 50ps 20ps R R R … Comb.log. A Comb.log. B Comb.log. C clock

Parallel Adder FA FA FA FA a4 a3 a2 a1 b4 b3 b2 b1 +------------ x4 x3 x2 x1 FA a2,b2 x1 FA a3,b3 x2 FA a4,b4 x3 FA x4

Pipelined Parallel Adder a4,b4 a3,b3 a2,b2 a1,b1 a4 a3 a2 a1 b4 b3 b2 b1 +------------ x4 x3 x2 x1 c4 c3 c2 c1 d4 d3 d2 d1 y4 y3 y2 y1 e4 e3 e2 e1 f4 f3 f2 f1 z4 z3 z2 z1 g4 g3 g2 g1 h4 h3 h2 h1 w4 w3 w2 w1 FA FA FA FA

Pipelined Parallel Adder c4,d4 c3,d3 c2,d2 c1,d1 a4 a3 a2 a1 b4 b3 b2 b1 +------------ x4 x3 x2 x1 c4 c3 c2 c1 d4 d3 d2 d1 y4 y3 y2 y1 e4 e3 e2 e1 f4 f3 f2 f1 z4 z3 z2 z1 g4 g3 g2 g1 h4 h3 h2 h1 w4 w3 w2 w1 FA a4,b4 a3,b3 a2,b2 x1 FA FA FA

Pipelined Parallel Adder e4,f4 e3,f3 e2,f2 e1,f1 a4 a3 a2 a1 b4 b3 b2 b1 +------------ x4 x3 x2 x1 c4 c3 c2 c1 d4 d3 d2 d1 y4 y3 y2 y1 e4 e3 e2 e1 f4 f3 f2 f1 z4 z3 z2 z1 g4 g3 g2 g1 h4 h3 h2 h1 w4 w3 w2 w1 FA c2,d2 y1 c4,d4 c3,d3 FA a3,b3 x2 x1 a4,b4 FA FA

Pipelined Parallel Adder g4,h4 g3,h3 g2,h2 g1,h1 a4 a3 a2 a1 b4 b3 b2 b1 +------------ x4 x3 x2 x1 c4 c3 c2 c1 d4 d3 d2 d1 y4 y3 y2 y1 e4 e3 e2 e1 f4 f3 f2 f1 z4 z3 z2 z1 g4 g3 g2 g1 h4 h3 h2 h1 w4 w3 w2 w1 FA e4,f4 e3,f3 e2,f2 z1 FA c4,d4 c3,d3 y2 y1 FA x3 a4,b4 x2 x1 FA

Pipelined Parallel Adder a4 a3 a2 a1 b4 b3 b2 b1 +------------ x4 x3 x2 x1 c4 c3 c2 c1 d4 d3 d2 d1 y4 y3 y2 y1 e4 e3 e2 e1 f4 f3 f2 f1 z4 z3 z2 z1 g4 g3 g2 g1 h4 h3 h2 h1 w4 w3 w2 w1 FA g3,h3 g2,h2 w1 g4,h4 FA e4,f4 e3,f3 z2 z1 FA c4,d4 y3 y2 y1 FA x4 x3 x2 x1

Floating-point Arithmeric Pipeline Pipelined Floating-point Addition Subtract exponents (E) Subtract exponents to check if they are equal Compare exponents and Align mantissas (M) Shift mantissas until the exponents are equal Add mantissas (A) Normalize result (N) n1 E M A N n2

Instruction Execution Pipeline Instruction Fetch Cycle (IF) Fetch current instruction from memory Increment PC Instruction decode / register fetch cycle (ID) Decode instruction Compute possible branch target Read registers from the register file Execution / effective address cycle (EX) Form the effective address ALU performs the operation specified by the opcode Memory access (MEM) Memory read for load instruction Memory write for store instruction Write-back cycle (WB) Write result into register file IF ID EX MEM WB

Instruction Execution Pipeline IF ID EX MEM WB stages WB MEM EX ID IF time

Control (Branch) Hazards Pipeline Hazards Control (Branch) Hazards Arise from pipelining of instructions (e.g. branch) that change PC. LOOP: LOAD 100,X ADD 200,X STORE 300,X DECX BNE LOOP ... for i=n to 1 ci = ai + bi stages WB MEM EX ID IF time

A Modern Processor Intel Core i7