The Basics: Pipelining J. Nelson Amaral University of Alberta 1.

Slides:



Advertisements
Similar presentations
Lecture 4: CPU Performance
Advertisements

COMP 4211 Seminar Presentation Based On: Computer Architecture A Quantitative Approach by Hennessey and Patterson Presenter : Feri Danes.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
Chapter 8. Pipelining.
Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
Instruction-Level Parallelism (ILP)
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipeline Hazards See: P&H Chapter 4.7.
Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.
CIS429/529 Winter 2007 Pipelining-1 1 Pipeling RISC/MIPS64 five stage pipeline Basic pipeline performance Pipeline hazards Branch hazards More pipeline.
Review: MIPS Pipeline Data and Control Paths
©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
CIS629 Fall 2002 Pipelining 2- 1 Control Hazards Created by branch statements BEQZLOC ADDR1,R2,R3. LOCSUBR1,R2,R3 PC needs to be computed but it happens.
EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
DLX Instruction Format
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
CSE378 Pipelining1 Pipelining Basic concept of assembly line –Split a job A into n sequential subjobs (A 1,A 2,…,A n ) with each A i taking approximately.
Pipeline Hazard CT101 – Computing Systems. Content Introduction to pipeline hazard Structural Hazard Data Hazard Control Hazard.
COMP381 by M. Hamdi 1 Pipelining Improving Processor Performance with Pipelining.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
Memory/Storage Architecture Lab Computer Architecture Pipelining Basics.
Lecture 5: Pipelining Implementation Kai Bu
1 Appendix A Pipeline implementation Pipeline hazards, detection and forwarding Multiple-cycle operations MIPS R4000 CDA5155 Spring, 2007, Peir / University.
Pipeline Extensions prepared and Instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University MIPS Extensions1May 2015.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
Electrical and Computer Engineering University of Cyprus LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.
Pipelining Example Laundry Example: Three Stages
Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.
CMPUT Computer Systems and Architecture1 CMPUT429/CMPE382 Winter 2001 Topic3-Pipelining José Nelson Amaral (Adapted from David A. Patterson’s CS252.
10/11: Lecture Topics Execution cycle Introduction to pipelining
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,
CSC 4250 Computer Architectures September 22, 2006 Appendix A. Pipelining.
CS203 – Advanced Computer Architecture Pipelining Review.
Interstage Buffers 1 Computer Organization II © McQuain Pipeline Timing Issues Consider executing: add $t2, $t1, $t0 sub $t3, $t1, $t0 or.
Pipeline Timing Issues
Computer Organization
Exceptions Another form of control hazard Could be caused by…
Stalling delays the entire pipeline
Pipelining: Hazards Ver. Jan 14, 2014
CSCI206 - Computer Organization & Programming
Pipelining.
Single Clock Datapath With Control
Appendix C Pipeline implementation
Exceptions & Multi-cycle Operations
Pipelining.
Pipelining: Advanced ILP
Review: MIPS Pipeline Data and Control Paths
Morgan Kaufmann Publishers The Processor
Chapter 4 The Processor Part 2
Pipelining review.
Current Design.
Pipelining in more detail
CSCI206 - Computer Organization & Programming
Pipelining Basic concept of assembly line
Pipeline control unit (highly abstracted)
Control unit extension for data hazards
Instruction Execution Cycle
Project Instruction Scheduler Assembler for DLX
Pipeline control unit (highly abstracted)
Pipeline Control unit (highly abstracted)
Pipelining Basic concept of assembly line
Pipelining.
Control unit extension for data hazards
Pipelining Basic concept of assembly line
Pipelining Appendix A and Chapter 3.
Introduction to Computer Organization and Architecture
Control unit extension for data hazards
MIPS Pipelined Datapath
Presentation transcript:

The Basics: Pipelining J. Nelson Amaral University of Alberta 1

The Pipeline Concept Bauer p. 32 2

3 Pipeline Throughput and Latency IFIDEXMEMWB 5 ns4 ns5 ns10 ns4 ns Consider the pipeline above with the indicated delays. We want to know what is the pipeline throughput and the pipeline latency. Pipeline throughput: instructions completed per second. Pipeline latency: how long does it take to execute a single instruction in the pipeline.

4 Pipeline Throughput and Latency IFIDEXMEMWB 5 ns4 ns5 ns10 ns4 ns Pipeline throughput: how often is an instruction completed? Pipeline latency: how long does it take to execute an instruction in the pipeline? Is this right?

5 Pipeline Throughput and Latency IFIDEXMEMWB 5 ns4 ns5 ns10 ns4 ns Simply adding the latencies to compute the pipeline latency, only would work for an isolated instruction IFMEMID I1 L(I1) = 28ns EXWB MEMID IF I2 L(I2) = 33ns EXWB MEMID IF I3 L(I3) = 38ns EXWB MEMID IF I4 L(I5) = 43ns EXWB We are in trouble! The latency is not constant. This happens because this is an unbalanced pipeline. The solution is to make every stage the same length as the longest one.

6 Pipeline Throughput and Latency IFIDEXMEMWB 5 ns4 ns5 ns10 ns4 ns The slowest pipeline state also limits the latency!! IFMEMID I1 L(I1) = L(I2) = L(I3) = L(I4) = 50ns EXWB IFMEMID I2 L(I2) = 50ns EXWB IFMEMIDEXWB IFMEMIDEX I3 I4

7 Pipeline Throughput and Latency IFIDEXMEMWB 5 ns4 ns5 ns10 ns4 ns How long does it take to execute instructions in this pipeline? (disregard bubbles caused by branches, cache misses, and hazards) How long would it take using the same modules without pipelining? What is the speedup due to pipelining?

8 Pipeline Throughput and Latency IFIDEXMEMWB 5 ns4 ns5 ns10 ns4 ns The speedup that we got from the pipeline is: How can we improve this pipeline design? We need to reduce the unbalance to increase the clock speed.

9 Pipeline Throughput and Latency IFIDEX MEM1 WB 5 ns4 ns5 ns 4 ns Now we have one more pipeline stage. What is the throughput now? MEM2 5 ns What is the new latency for a single instruction?

10 Pipeline Throughput and Latency IFIDEX MEM1 WB 5 ns4 ns5 ns 4 ns MEM2 5 ns IF MEM1 ID I1 EXWB MEM1 IF MEM1 ID I2 EXWB MEM1 IF MEM1 ID I3 EXWB MEM1 IF MEM1 ID I4 EXWB MEM1 IF MEM1 ID I5 EXWB MEM1 IF MEM1 ID I6 EXWB MEM1 IF MEM1 ID I7 EXWB MEM1

11 Pipeline Throughput and Latency IFIDEX MEM1 WB 5 ns4 ns5 ns 4 ns MEM2 5 ns How long does it take to execute instructions in this pipeline? (disregard bubles caused by branches, cache misses, etc, for now) What is the speedup that we get from pipelining?

12 Pipeline Throughput and Latency IFIDEX MEM1 WB 5 ns4 ns5 ns 4 ns MEM2 5 ns What have we learned from this example? 1. It is important to balance the delays in the stages of the pipeline 2. The throughput of a pipeline is 1/max(delay). 3. The latency is N  max(delay), where N is the number of stages in the pipeline.

Execution Snapshot Bauer p

Pipeline with Control Unit Bauer p

Data Hazards and Forwarding Example 1: i:R7 ← R12 + R15 i+1:R8 ← R7 – R12 i+2:R15 ← R8 + R7 Read-After-Write (RAW) dependencies (true dependencies) Write-After-Read (WAR) dependencies (anti dependencies) Bauer p

Data Hazards and Forwarding v v v Bauer p

Forwarding Bauer p

Load-ALU RAW Dependency Example 2: i:R6 ← Mem[R2] i+1:R7 ← R6 + R4 The data from the load is not available until the Mem/WB of instruction i, but it is needed at the ID/EX of instruction i+1 Cannot forward back on time! Bauer p

Bubble because of load Bauer p

Priority on Forwarding Example: i:R10 ← R4 + R5 i+1:R10 ← R4 – R10 i+2:R8 ← R10 + R7 The RAW from i+1 to i+2 must take priority over the RAW from i to i+2. Bauer p

Forwarding from Mem/WB to Mem Example: i:R5 ← Mem[R6] i+1:Mem[R8] ← R5 Bauer p After the load, the contents of the Mem/WB register must be forwarded to be written to memory (not only to R5).

Pipelining with Forwarding and Stall Bauer p

Control Hazards (branches) Bauer p

Control Hazards: Exceptions and Interruptions Exceptions can occur in any stage (except WB) – IF: page faults – ID: Illegal opcodes – EX: arithmetic exceptions – Mem: illegal address, page faults Interruptions: – I/O termination, time-outs – Power failures Bauer p

Handling Exceptions/Interruptions Save the Process State Schedule Process Restart Clear Exception Condition Abort Program “Correct” Exception “Correct” Exception Perform Unrelated Task ? Bauer p

Precise Exceptions in a Pipeline If an exceptions happens in instruction i: Instructions i-1, i-2, … complete normally and contribute to the saved state of the process Instructions i, i+1, i+2, … become no-ops After the exception is handled, execution re-starts at instruction i – The PC saved is the PC of instruction i. Bauer p i i-1 i-2 i+2 i+1 ⋅⋅⋅ Complete normally no-op Exception happens here → ←Execution re-starts here

Implementing Precise Exceptions in the Pipeline 1.Flag the pipeline register at the right of the stage where exception was detected – This Flag moves along the pipeline 2.Set all control lines at a stage with the flag to transform the instruction into a no-op 3.Stop instruction fetching 4.When the flag reaches the Mem/WB stage, save the PC of that instruction as the exception PC Bauer p

Program Order X Temporal Order divide-by-zero exception page-fault exception Which exception occurs first in time? Which exception should be handled first? Bauer p

Bauer p Design Issues: Can’t avoid Load/ALU instr. bubble Branch resolution in EX stage → Two-cycle branch penalty Mem stage unused for ALU instr

Alternative Pipelining Design: Avoiding the load latency penalty Example: i: R4 ← Mem[R8] i+1: R7 ← R4 + R5 Bauer p

Avoiding the load latency penalty Example: i: R4 ← Mem[R8] i+1: R7 ← R4 + R5 Bauer p

Address Generation Latency Penalty Example: i: R5 ← R6 + R7 i+1: R9 ← Mem[R5] Can’t forward from future. Has to stall. Bauer p

Other changes AG used for branch resolution AG unused for ALU operations Bauer p

Tradeoffs: Bauer p Avoids load/ALU bubble X additional ALU unit Move branch resolution to AG → same penalty AG stage unused for ALU operations Stalls for ALU/Store instr. dependency

Which one is better? MIPS Intel 486 Bauer p

Pipelining Functional Units: the EX stage Parameters of interest: – number of stages – minimum number of cycles before two independent (no RAW) instructions of the same type can enter the functional unit Bauer p

Single-Precision Floating Point Representation Most standard floating point representation use: 1 bit for the sign (positive or negative) 8 bits for the range (exponent field) 23 bits for the precision (fraction field) SEF 2381 From: Patt and Patel, pp. 33 P-H. p. 245 Bauer p. 45 exponent fraction sign 37

Special Floating Point Representations In the 8-bit field of the exponent we can represent numbers from 0 to 255. We studied how to read numbers with exponents from 0 to 254. What is the value represented when the exponent is 255 (i.e )? An exponent equal 255 = in a floating point representation indicates a special value. When the exponent is equal 255 = and the fraction is 0, the value represented is  infinity. When the exponent is equal 255 = and the fraction is non-zero, the value represented is Not a Number (NaN). Hen/Patt, pp. 301 P-H. p. 246 Bauer p

Stage 1 Stage 2-3 Stage 4 Floating Point Addition (S 1, E 1, F 1 )(S 2, E 2, F 2 ) E 1 < E 2 Insert 1 to left of F 1 and to left of F 2 S 1 ≠ S 2 D = E 1 – E 2 F 2 ← F 2 << D add mantissas Normalize and round off swap operands yes replace F 2 by its 2-complement yes Bauer p