Pipelining Appendix A and Chapter 3.

Slides:



Advertisements
Similar presentations
PipelineCSCE430/830 Pipeline: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Prof. Yifeng Zhu, U of Maine Fall,
Advertisements

CMPT 334 Computer Organization
Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
Pipelining I (1) Fall 2005 Lecture 18: Pipelining I.
Pipelining Hwanmo Sung CS147 Presentation Professor Sin-Min Lee.
Computer Architecture
Pipelining Preview Basics & Challenges
CS252/Patterson Lec 1.1 1/17/01 Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Computer ArchitectureFall 2007 © October 22nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
DLX Instruction Format
Pipelining Datapath Adapted from the lecture notes of Dr. John Kubiatowicz (UC Berkeley) and Hank Walker (TAMU)
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 9, 2002 Topic: Pipelining Basics.
1 Atanasoff–Berry Computer, built by Professor John Vincent Atanasoff and grad student Clifford Berry in the basement of the physics building at Iowa State.
Computer ArchitectureFall 2008 © October 6th, 2008 Majd F. Sakr CS-447– Computer Architecture.
Pipelining Basics Assembly line concept An instruction is executed in multiple steps Multiple instructions overlap in execution A step in a pipeline is.
Introduction to Pipelining Rabi Mahapatra Adapted from the lecture notes of Dr. John Kubiatowicz (UC Berkeley)
CS1104: Computer Organisation School of Computing National University of Singapore.
CSC 4250 Computer Architectures September 15, 2006 Appendix A. Pipelining.
Lecture 7: Pipelining Review Kai Bu
COMP381 by M. Hamdi 1 Pipelining Improving Processor Performance with Pipelining.
Lecture 5: Pipelining Implementation Kai Bu
Lecture 05: Pipelining Basics & Hazards Kai Bu
Computer Science Education
Integrated Circuits Costs
B 0000 Pipelining ENGR xD52 Eric VanWyk Fall
EEL5708 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Pipelining.
Analogy: Gotta Do Laundry
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
Electrical and Computer Engineering University of Cyprus LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.
1 Pipelining Part I CS What is Pipelining? Like an Automobile Assembly Line for Instructions –Each step does a little job of processing the instruction.
ECE 232 L18.Pipeline.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 18 Pipelining.
CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]

CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
Processor Design CT101 – Computing Systems. Content GPR processor – non pipeline implementation Pipeline GPR processor – pipeline implementation Performance.
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
CMPUT Computer Systems and Architecture1 CMPUT429/CMPE382 Winter 2001 Topic3-Pipelining José Nelson Amaral (Adapted from David A. Patterson’s CS252.
CS252/Patterson Lec 1.1 1/17/01 معماري کامپيوتر - درس نهم pipeline برگرفته از درس : Prof. David A. Patterson.
EE524/CptS561 Jose G. Delgado-Frias 1 Processor Basic steps to process an instruction IFID/OFEXMEMWB Instruction Fetch Instruction Decode / Operand Fetch.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 10 Computer Hardware Design (Pipeline Datapath and Control Design) Prof. Dr.
Lecture 18: Pipelining I.
Computer Organization
Pipelines An overview of pipelining
Review: Instruction Set Evolution
Performance of Single-cycle Design
CMSC 611: Advanced Computer Architecture
Morgan Kaufmann Publishers The Processor
ECE232: Hardware Organization and Design
School of Computing and Informatics Arizona State University
Chapter 3: Pipelining 순천향대학교 컴퓨터학부 이 상 정 Adapted from
Chapter 4 The Processor Part 2
CMSC 611: Advanced Computer Architecture
Lecturer: Alan Christopher
Serial versus Pipelined Execution
The Processor Lecture 3.6: Control Hazards
The Processor Lecture 3.4: Pipelining Datapath and Control
An Introduction to pipelining
Chapter 8. Pipelining.
Introduction to Computer Organization and Architecture
A relevant question Assuming you’ve got: One washer (takes 30 minutes)
Recall: Performance Evaluation
Pipelining.
Presentation transcript:

Pipelining Appendix A and Chapter 3

Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer takes 40 minutes “Folder” takes 20 minutes A B C D

Sequential Laundry Sequential laundry takes 6 hours for 4 loads 6 PM 7 8 9 10 11 Midnight Time 30 40 20 30 40 20 30 40 20 30 40 20 T a s k O r d e A B C D Sequential laundry takes 6 hours for 4 loads If they learned pipelining, how long would laundry take?

Pipelined Laundry Start work ASAP 6 PM 7 8 9 10 11 Midnight Time 30 40 20 T a s k O r d e A B C D Pipelined laundry takes 3.5 hours for 4 loads

Key Definitions Pipelining is a key implementation technique used to build fast processors. It allows the execution of multiple instructions to overlap in time. A pipeline within a processor is similar to a car assembly line. Each assembly station is called a pipe stage or a pipe segment. The throughput of an instruction pipeline is the measure of how often an instruction exits the pipeline.

Pipeline Stages We can divide the execution of an instruction into the following 5 “classic” stages: IF: Instruction Fetch ID: Instruction Decode, register fetch EX: Execution MEM: Memory Access WB: Register write Back

Pipeline Throughput and Latency IF ID EX MEM WB 5 ns 4 ns 10 ns Consider the pipeline above with the indicated delays. We want to know what is the pipeline throughput and the pipeline latency. Pipeline throughput: instructions completed per second. Pipeline latency: how long does it take to execute a single instruction in the pipeline.

Pipeline Throughput and Latency IF ID EX MEM WB 5 ns 4 ns 10 ns Pipeline throughput: how often an instruction is completed. Pipeline latency: how long does it take to execute an instruction in the pipeline. Is this right?

Pipeline Throughput and Latency IF ID EX MEM WB 5 ns 4 ns 10 ns Simply adding the latencies to compute the pipeline latency, only would work for an isolated instruction IF MEM ID I1 L(I1) = 28ns EX WB IF I2 L(I2) = 33ns ID EX MEM WB IF I3 L(I3) = 38ns ID EX MEM WB IF I4 ID EX MEM WB L(I5) = 43ns We are in trouble! The latency is not constant. This happens because this is an unbalanced pipeline. The solution is to make every state the same length as the longest one.

Food for thought? What is the impact of latency when we have synchronous pipelines? A synchronous pipeline is one where even if there are non-uniform stages, each stage has to wait until all the stages have finished Assess the impact of clock skew on synchronous pipelines if any.

Pipelining Lessons Pipelining doesn’t help latency of single task, it helps throughput of entire workload Pipeline rate limited by slowest pipeline stage Multiple tasks operating simultaneously Potential speedup = Number pipe stages Unbalanced lengths of pipe stages reduces speedup Time to “fill” pipeline and time to “drain” it reduces speedup 6 PM 7 8 9 Time T a s k O r d e 30 40 20 A B C D

Other Definitions Pipe stage or pipe segment Pipeline depth A decomposable unit of the fetch-decode-execute paradigm Pipeline depth Number of stages in a pipeline Machine cycle Clock cycle time Latch Per phase/stage local information storage unit

Design Issues Balance the length of each pipeline stage Problems Throughput = Time per instruction on unpipelined machine Depth of the pipeline Problems Usually, stages are not balanced Pipelining overhead Hazards (conflicts) Performance (throughput CPU performance equation) Decrease of the CPI Decrease of cycle time

MIPS Instruction Formats opcode rs1 rd immediate 5 6 10 11 15 16 31 R opcode rs1 rs2 rd Shamt/function 5 6 10 11 15 16 20 21 31 J opcode address 5 6 31 Fixed-field decoding

1st and 2nd Instruction cycles Instruction fetch (IF) IR Mem[PC]; NPC PC + 4 Instruction decode & register fetch (ID) A Regs[IR6..10]; B Regs[IR11..15]; Imm ((IR16)16 # # IR16..31)

3rd Instruction cycle Execution & effective address (EX) Memory reference ALUOutput A + Imm Register - Register ALU instruction ALUOutput A func B Register - Immediate ALU instruction ALUOutput A op Imm Branch ALUOutput NPC + Imm; Cond (A op 0)

4th Instruction cycle Memory access & branch completion (MEM) Memory reference PC NPC LMD Mem[ALUOutput] (load) Mem[ALUOutput] B (store) Branch if (cond) PC ALUOutput; else PC NPC

5th Instruction cycle Write-back (WB) Register - register ALU instruction Regs[IR16..20] ALUOutput Register - immediate ALU instruction Regs[IR11..15] ALUOutput Load instruction Regs[IR11..15] LMD

5 Steps of MIPS Datapath 4 Instruction Fetch Instr. Decode Reg. Fetch Execute Addr. Calc Memory Access Write Back Next PC MUX 4 Adder Next SEQ PC Zero? RS1 Reg File Address MUX Memory RS2 ALU Inst Memory Data L M D RD MUX MUX Sign Extend Imm WB Data

5 Steps of MIPS Datapath 4 Data stationary control Instruction Fetch Instr. Decode Reg. Fetch Execute Addr. Calc Memory Access Write Back Next PC IF/ID ID/EX MEM/WB EX/MEM MUX Next SEQ PC Next SEQ PC 4 Adder Zero? RS1 Reg File Address Memory MUX RS2 ALU Memory Data MUX MUX Sign Extend Imm WB Data RD RD RD Data stationary control local decode for each instruction phase / pipeline stage

Steps to Execute Each Instruction Type

DETAILED IMPLEMENTATION 1 2 M u x Target 4 Conc/ Shift left 2 32 26 PC M u x 1 M u x 1 I[25-21] Read address Read register 1 Instruction [31-26] I[20-16] Read data 1 Zero Memory Read register 2 Write address M u x 1 ALU result Instruction [25-0] Write register Read data 2 MemData M u x 1 2 3 ALU Write data Instruction register Write data 4 [15-11] 1 M u x Registers 32 I[15-0] Sign ext. Shift left 2 16

Control Step 1 Step 2 Step 3 Step 3 Step 3 Step 3 Step 4 Step 4 Step 4 Load RR ALU Store Imm Step 3 Step 3 Step 3 Step 3 Step 4 Step 4 Step 4 Step 4 Step 5

Basic Pipeline Clock number 1 2 3 4 5 6 7 8 9 Instr # i i +1 i +2 i +3 1 2 3 4 5 6 7 8 9 Instr # IF ID EX MEM WB i i +1 IF ID EX MEM WB i +2 IF ID EX MEM WB i +3 IF ID EX MEM WB i +4 IF ID EX MEM WB

Pipeline Resources Reg IM DM Reg Reg IM DM Reg Reg IM DM Reg Reg IM DM ALU Reg IM DM Reg ALU Reg IM DM Reg ALU Reg IM DM Reg ALU Reg IM DM Reg ALU

Pipelined Datapath MEM/WB IF/ID ID/EX EX/MEM Mux 4 Zero? Add Mux Mux PC Instr. Cache ALU Regs Data Cache Mux Sign extend

Performance limitations Imbalance among pipe stages limits cycle time to slowest stage Pipelining overhead Pipeline register delay Clock skew Clock cycle > clock skew + latch overhead