Pipelining.

Slides:



Advertisements
Similar presentations
Machine cycle.
Advertisements

Lecture 4: CPU Performance
PIPELINING AND VECTOR PROCESSING
PIPELINE AND VECTOR PROCESSING
Pipeline Example: cycle 1 lw R10,9(R1) sub R11,R2, R3 and R12,R4, R5 or R13,R6, R7.
Computer Architecture Lecture 2 Abhinav Agarwal Veeramani V.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
ELEN 468 Advanced Logic Design
CMPT 334 Computer Organization
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University RISC Pipeline See: P&H Chapter 4.6.
© Kavita Bala, Computer Science, Cornell University Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipelining See: P&H Chapter 4.5.
1 Recap (Pipelining). 2 What is Pipelining? A way of speeding up execution of tasks Key idea : overlap execution of multiple taks.
CSCE 212 Quiz 9 – 3/30/11 1.What is the clock cycle time based on for single-cycle and for pipelining? 2.What two actions can be done to resolve data hazards?
Computer ArchitectureFall 2007 © October 31, CS-447– Computer Architecture M,W 10-11:20am Lecture 17 Review.
Appendix A Pipelining: Basic and Intermediate Concepts
5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2.
Lecture 8: Processors, Introduction EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014,
Lecture 14: Processors CS 2011 Fall 2014, Dr. Rozier.
1 COMP541 Multicycle MIPS Montek Singh Apr 4, 2012.
Morgan Kaufmann Publishers
Pipelining (I). Pipelining Example  Laundry Example  Four students have one load of clothes each to wash, dry, fold, and put away  Washer takes 30.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
Pipeline Hazards. CS5513 Fall Pipeline Hazards Situations that prevent the next instructions in the instruction stream from executing during its.
Fetch-execute cycle.
Computer Architecture Lecture 10 MIPS Control Unit Ralph Grishman Oct NYU.
Branch Hazards and Static Branch Prediction Techniques
Pipelining Example Laundry Example: Three Stages
EE524/CptS561 Jose G. Delgado-Frias 1 Processor Basic steps to process an instruction IFID/OFEXMEMWB Instruction Fetch Instruction Decode / Operand Fetch.
11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.
10/11: Lecture Topics Execution cycle Introduction to pipelining
Introduction to Computer Organization Pipelining.
Real-World Pipelines Idea Divide process into independent stages
Computer Organization
Pipelines An overview of pipelining
CS 286 Computer Architecture & Organization
CSCI206 - Computer Organization & Programming
Morgan Kaufmann Publishers
Lecture 07: Pipelining Multicycle, MIPS R4000, and More
ELEN 468 Advanced Logic Design
Morgan Kaufmann Publishers The Processor
Single Clock Datapath With Control
Pipeline Implementation (4.6)
CDA 3101 Spring 2016 Introduction to Computer Organization
ECE232: Hardware Organization and Design
Design of the Control Unit for Single-Cycle Instruction Execution
Pipelining.
Pipelining: Advanced ILP
Morgan Kaufmann Publishers The Processor
Pipelining and Vector Processing
Pipelining Multicycle, MIPS R4000, and More
Pipelining review.
Design of the Control Unit for One-cycle Instruction Execution
Serial versus Pipelined Execution
Pipelining in more detail
CSC 4250 Computer Architectures
CSCI206 - Computer Organization & Programming
Rocky K. C. Chang 6 November 2017
Data Hazards Data Hazard
Control unit extension for data hazards
An Introduction to pipelining
Pipelining: Basic Concepts
COMPUTER ARCHITECTURES FOR PARALLEL ROCESSING
Pipelining.
Control unit extension for data hazards
Pipelining Appendix A and Chapter 3.
Control unit extension for data hazards
Guest Lecturer: Justin Hsia
Lecture: Pipelining Basics
MIPS Pipelined Datapath
Pipelining.
Presentation transcript:

Pipelining

Pipelining s1 s2 s3 Without pipeline With pipeline stages stages s3 s3 time time Without pipeline With pipeline

Pipelining Without pipeline With pipeline T1 = s . t . n stages stages Without pipeline With pipeline s3 s3 s2 s2 s1 s1 time time T1 = s . t . n Ts = s . t + (n-1).t Speedup = T1 / Ts = s.n = s s+(n-1) s/n +(1-1/n) Speedup = s n s – stages n – tasks t – time per stage Throughput = n Ts

Pipelining Without pipeline With pipeline T1 = s . t . n stages stages Without pipeline With pipeline s3 s3 s2 s2 s1 s1 T1 = s . t . n Ts = s . t + (n-1).t s = 3 n T1 Ts Speedup Throughput 1 3t 1/3t 10 30t 3t+9t = 12t 30/12 = 2.5 10/12t 100 300t 3t+99t = 102t 300/102 = 2.9 100/102t 1000000 3000000t 3t+999999t = 1000002t = 2.999994  1/t Speedup = T1 / Ts Speedup = s n Throughput = n Ts

Pipelining Slowest stage determines the pipeline performance s1 s2 s3 10 30 20 stages stages s3 s3 s2 s2 s1 s1 time time Without pipeline With pipeline Slowest stage determines the pipeline performance

Pipelining Deep pipeline s1 s2 s3 3 stages 6 stages s1 s21 s22 s23 s31 10 30 20 s1 s21 s22 10 10 10 10 10 10 s23 s31 s32 stages stages s1 s2 s3 s4 s5 s6 s1 s2 s3 time time 3 stages 6 stages Deep pipeline

Computational Pipelines Combinatorial logic Reg clock R R R Comb.log. A Comb.log. B Comb.log. C clock

Limitations of Pipelining Nonuniform partitioning Stage delays may be nonuniform Throughput is limited by the slowest stage Deep pipelining Large number of stages Modern processors have deep pipelines (15 or more) to increase the clock rate. 50ps 20ps 150ps 20ps 100ps 20ps Comb.log. A R B C clock 50ps 20ps 50ps 20ps 50ps 20ps R R R … Comb.log. A Comb.log. B Comb.log. C clock

Parallel Adder FA FA FA FA a4 a3 a2 a1 b4 b3 b2 b1 +------------ x4 x3 x2 x1 FA a2,b2 x1 FA a3,b3 x2 FA a4,b4 x3 FA x4

Pipelined Parallel Adder a4,b4 a3,b3 a2,b2 a1,b1 a4 a3 a2 a1 b4 b3 b2 b1 +------------ x4 x3 x2 x1 c4 c3 c2 c1 d4 d3 d2 d1 y4 y3 y2 y1 e4 e3 e2 e1 f4 f3 f2 f1 z4 z3 z2 z1 g4 g3 g2 g1 h4 h3 h2 h1 w4 w3 w2 w1 FA FA FA FA

Pipelined Parallel Adder c4,d4 c3,d3 c2,d2 c1,d1 a4 a3 a2 a1 b4 b3 b2 b1 +------------ x4 x3 x2 x1 c4 c3 c2 c1 d4 d3 d2 d1 y4 y3 y2 y1 e4 e3 e2 e1 f4 f3 f2 f1 z4 z3 z2 z1 g4 g3 g2 g1 h4 h3 h2 h1 w4 w3 w2 w1 FA a4,b4 a3,b3 a2,b2 x1 FA FA FA

Pipelined Parallel Adder e4,f4 e3,f3 e2,f2 e1,f1 a4 a3 a2 a1 b4 b3 b2 b1 +------------ x4 x3 x2 x1 c4 c3 c2 c1 d4 d3 d2 d1 y4 y3 y2 y1 e4 e3 e2 e1 f4 f3 f2 f1 z4 z3 z2 z1 g4 g3 g2 g1 h4 h3 h2 h1 w4 w3 w2 w1 FA c2,d2 y1 c4,d4 c3,d3 FA a3,b3 x2 x1 a4,b4 FA FA

Pipelined Parallel Adder g4,h4 g3,h3 g2,h2 g1,h1 a4 a3 a2 a1 b4 b3 b2 b1 +------------ x4 x3 x2 x1 c4 c3 c2 c1 d4 d3 d2 d1 y4 y3 y2 y1 e4 e3 e2 e1 f4 f3 f2 f1 z4 z3 z2 z1 g4 g3 g2 g1 h4 h3 h2 h1 w4 w3 w2 w1 FA e4,f4 e3,f3 e2,f2 z1 FA c4,d4 c3,d3 y2 y1 FA x3 a4,b4 x2 x1 FA

Pipelined Parallel Adder a4 a3 a2 a1 b4 b3 b2 b1 +------------ x4 x3 x2 x1 c4 c3 c2 c1 d4 d3 d2 d1 y4 y3 y2 y1 e4 e3 e2 e1 f4 f3 f2 f1 z4 z3 z2 z1 g4 g3 g2 g1 h4 h3 h2 h1 w4 w3 w2 w1 FA g3,h3 g2,h2 w1 g4,h4 FA e4,f4 e3,f3 z2 z1 FA c4,d4 y3 y2 y1 FA x4 x3 x2 x1

Floating-point Arithmeric Pipeline Pipelined Floating-point Addition Subtract exponents (E) Subtract exponents to check if they are equal Compare exponents and Align mantissas (M) Shift mantissas until the exponents are equal Add mantissas (A) Normalize result (N) n1 E M A N n2

Instruction Execution Pipeline Instruction Fetch Cycle (IF) Fetch current instruction from memory Increment PC Instruction decode / register fetch cycle (ID) Decode instruction Compute possible branch target Read registers from the register file Execution / effective address cycle (EX) Form the effective address ALU performs the operation specified by the opcode Memory access (MEM) Memory read for load instruction Memory write for store instruction Write-back cycle (WB) Write result into register file IF ID EX MEM WB

Instruction Execution Pipeline IF ID EX MEM WB stages WB MEM EX ID IF time

Control (Branch) Hazards Pipeline Hazards Control (Branch) Hazards Arise from pipelining of instructions (e.g. branch) that change PC. LOOP: LOAD 100,X ADD 200,X STORE 300,X DECX BNE LOOP ... for i=n to 1 ci = ai + bi stages WB MEM EX ID IF time

A Modern Processor Intel Core i7