1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.

Slides:



Advertisements
Similar presentations
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Advertisements

Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
CMPT 334 Computer Organization
Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
Goal: Describe Pipelining
Chapter Six 1.
Pipelined Processor II (cont’d) CPSC 321
11/1/2005Comp 120 Fall November Exam Postmortem 11 Classes to go! Read Sections 7.1 and 7.2 You read 6.1 for this time. Right? Pipelining then.
1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.
Pipelining III Andreas Klappenecker CPSC321 Computer Architecture.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 1.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Chapter Six Enhancing Performance with Pipelining
Pipelining Andreas Klappenecker CPSC321 Computer Architecture.
1 Chapter Six - 2nd Half Pipelined Processor Forwarding, Hazards, Branching EE3055 Web:
L18 – Pipeline Issues 1 Comp 411 – Spring /03/08 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you.
L17 – Pipeline Issues 1 Comp 411 – Fall /1308 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you been.
1 CSE SUNY New Paltz Chapter Six Enhancing Performance with Pipelining.
Appendix A Pipelining: Basic and Intermediate Concepts
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve performance by increasing instruction throughput.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 17 - Pipelined.
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
Enhancing Performance with Pipelining Slides developed by Rami Abielmona and modified by Miodrag Bolic High-Level Computer Systems Design.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
11/13/2015 8:57 AM 1 of 86 Pipelining Chapter 6. 11/13/2015 8:57 AM 2 of 86 Overview of Pipelining Pipelining is an implementation technique in which.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
CMPE 421 Parallel Computer Architecture
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
Cs 152 L1 3.1 DAP Fa97,  U.CB Pipelining Lessons °Pipelining doesn’t help latency of single task, it helps throughput of entire workload °Multiple tasks.
Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,
Computing Systems Pipelining: enhancing performance.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
1/24/ :00 PM 1 of 86 Pipelining Chapter 6. 1/24/ :00 PM 2 of 86 Overview of Pipelining Pipelining is an implementation technique in which.
Pipelining Example Laundry Example: Three Stages
LECTURE 7 Pipelining. DATAPATH AND CONTROL We started with the single-cycle implementation, in which a single instruction is executed over a single cycle.
Introduction to Computer Organization Pipelining.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
L17 – Pipeline Issues 1 Comp 411 – Fall /23/09 CPU Pipelining Issues Read Chapter This pipe stuff makes my head hurt! What have you been.
Pipelining Intro Computer Organization 1 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Basic Instruction Timings Making some assumptions.
1  2004 Morgan Kaufmann Publishers No encoding: –1 bit for each datapath operation –faster, requires more memory (logic) –used for Vax 780 — an astonishing.
CS203 – Advanced Computer Architecture Pipelining Review.
Chapter Six.
Computer Organization
Pipelining Chapter 6.
CSCI206 - Computer Organization & Programming
Morgan Kaufmann Publishers
Single Clock Datapath With Control
Pipeline Implementation (4.6)
CDA 3101 Spring 2016 Introduction to Computer Organization
Chapter 4 The Processor Part 3
Morgan Kaufmann Publishers The Processor
Chapter 4 The Processor Part 2
Pipelining Chapter 6.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #2
Pipelining in more detail
CSCI206 - Computer Organization & Programming
Chapter Six.
Chapter Six.
November 5 No exam results today. 9 Classes to go!
CS203 – Advanced Computer Architecture
Morgan Kaufmann Publishers The Processor
Presentation transcript:

1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining

2  1998 Morgan Kaufmann Publishers Definition Pipeline is an implementation technique in which multiple instructions are overlapped in execution. We’ll use a laundry analogy for pipelining to explain the main concepts. There are four stages in doing the laundry: –put dirty clothes to the washer (wash) –placed washed clothes in the dryer (dry) –place the dry load on the table and fold (fold) –put clothes away (store) What about the MIPS instruction?

3  1998 Morgan Kaufmann Publishers Single-Cycle vs Pipelined Performance Look at lw, sw, add, sub,and, or, slt and beq. Operation time for major functional components: –2ns for memory access –2ns for ALU operation –1ns for register file read or write Total execution time for 3 instructions: –3x8=24 ns for a single-cycled,non-pipelined processor –14 ns (see Figure in next page) for a pipelined processor Total execution time for 1003 instructions: –1000x8ns + 24 ns = 8024 ns for a single-cycled,non-pipelined processor –1000x2ns + 14 ns= 2014 ns for a pipelined processor Speedup is less than the number of stages because: –stages may be imperfectly balanced –overhead involved

4  1998 Morgan Kaufmann Publishers Pipelining Improve performance by increasing instruction throughput Each instruction still take the same time to execute Ideal speedup is number of stages in the pipeline. Do we achieve this? 2 ns Instruction fetch RegALU Data access Reg 2 ns2 ns2 ns2 ns2 ns Program execution order (in instructions)

5  1998 Morgan Kaufmann Publishers Pipelining in MIPS- What makes it easy All instructions are the same length: instruction fetch (1st pipeline stage) and decoding(2nd stage) are much easier MIPS has just a few instruction formats, source register field in the same location ==> register file read and instruction decoding can be done at the same time Memory operands appear only in loads and stores (as opposed to 80x86, where we could operate on the operands in memory) Operands must be aligned in memory: need not worry about a single data transfer instruction requiring two data memory accesses.

6  1998 Morgan Kaufmann Publishers Pipelining in MIPS- What makes it hard? structural hazards: suppose we had only one memory control hazards: need to worry about branch instructions data hazards: an instruction depends on a previous instruction

7  1998 Morgan Kaufmann Publishers Structural Hazards If we have a fourth instruction in the following figure? What happens between time 6 and 8 ns? 2 ns Instruction fetch RegALU Data access Reg 2 ns2 ns2 ns2 ns2 ns Program execution order (in instructions)

8  1998 Morgan Kaufmann Publishers Control Hazards Possible solution: –stall: to pause before continuing the pipeline, not efficient if we have a long pipeline –pipeline stall is also known as bubble Program execution order (in instructions) The above figure assumes that we have extra hardware in place to resolve the branch in the second stage. Otherwise the pause will be longer than 4ns.

9  1998 Morgan Kaufmann Publishers Control Hazards Another solution: Predict Instruction fetch RegALU Data access Reg 2 ns 4 ns bubblebubble bubblebubblebubble Program execution order (in instructions)

10  1998 Morgan Kaufmann Publishers Control Hazards Delayed branch: ns (Delayed branch slot) Program execution order (in instructions)

11  1998 Morgan Kaufmann Publishers Data Hazards Look at the following example: add $s0, $t0, $t1 sub $t2, $s0, $t3 We need the result $s0 from the add instruction to do the subtraction. Is the data ready? Compiler cannot handle this issue Solution: forwarding or bypassing, i.e., getting the missing item early from the internal resources.

12  1998 Morgan Kaufmann Publishers Graphical representation of the instruction pipeline IF: instruction fetch ID: instruction decode EX: execution MEM: memory access WB: write back Shading: element used, White: element not used Right-shading: read, Left-Shading: write Time add $s0, $t0, $t1 IFID WB EX MEM

13  1998 Morgan Kaufmann Publishers Forwarding As soon as ALU add is finished, forward the result add $s0, $t0, $t1 sub $t2, $s0, $t3 Program execution order (in instructions) IFIDWBEX IFID MEM EX Time MEM WBMEM

14  1998 Morgan Kaufmann Publishers Forwarding with stall For R-format instruction following a load that tries to use the data, load-use data hazard will occur. Need to stall in this case. bblebubble

15  1998 Morgan Kaufmann Publishers Reordering Code to Avoid Pipeline Stalls Original code: # register $t1 has the address of v[k] lw $t0, 0($t1) # reg $t0 = v[k] lw $t2, 4($t1) # reg $t1=v[k+1] sw $t2, 0($t1) # v[k] = reg $t2 sw $t0, 4($t1) # v[k+1]= reg $t0 Data hazard occurs on register $t2 between the second lw and the first sw Modified code removes the hazard # register $t1 has the address of v[k] lw $t0, 0($t1) # reg $t0 = v[k] lw $t2, 4($t1) # reg $t1=v[k+1] sw $t0, 4($t1) # v[k+1]= reg $t0 sw $t2, 0($t1) # v[k] = reg $t2

16  1998 Morgan Kaufmann Publishers A Pipelined Datapath What do we need to add to actually split the datapath into stages? xecute/ address calculation MEM: Memory accessWB: Write back

17  1998 Morgan Kaufmann Publishers Pipelined Datapath Can you find a problem even if there are no dependencies? What instructions can we execute to manifest the problem? data M u x 1 Registers Read data 1 Read data 2 Read register 1 Read register 2 16 Sign extend Write register Write data Read data 1 ALU result M u x ALU Zero ID/EX Data memory Address

18  1998 Morgan Kaufmann Publishers Corrected Datapath

19  1998 Morgan Kaufmann Publishers Graphically Representing Pipelines Can help with answering questions like: –how many cycles does it take to execute this code? –what is the ALU doing during cycle 4? –use this representation to help understand datapaths ALU ALU

20  1998 Morgan Kaufmann Publishers Pipeline Control

21  1998 Morgan Kaufmann Publishers We have 5 stages. What needs to be controlled in each stage? –Instruction Fetch and PC Increment –Instruction Decode / Register Fetch –Execution –Memory Stage –Write Back How would control be handled in an automobile plant? –a fancy control center telling everyone what to do? –should we use a finite state machine? Pipeline control

22  1998 Morgan Kaufmann Publishers Pass control signals along just like the data Pipeline Control

23  1998 Morgan Kaufmann Publishers Datapath with Control

24  1998 Morgan Kaufmann Publishers Problem with starting next instruction before first is finished –dependencies that o backward in time?are data hazards Dependencies

25  1998 Morgan Kaufmann Publishers Have compiler guarantee no hazards Where do we insert the ops?? sub$2, $1, $3 and $12, $2, $5 or$13, $6, $2 add$14, $2, $2 sw$15, 100($2) Problem: this really slows us down! Software Solution

26  1998 Morgan Kaufmann Publishers Use temporary results, don’t wait for them to be written –register file forwarding to handle read/write to same register –ALU forwarding Forwarding what if this $2 was $13?

27  1998 Morgan Kaufmann Publishers Forwarding

28  1998 Morgan Kaufmann Publishers Load word can still cause a hazard: –an instruction tries to read a register following a load instruction that writes to the same register. Thus, we need a hazard detection unit to stall the load instruction Can't always forward

29  1998 Morgan Kaufmann Publishers Stalling We can stall the pipeline by keeping an instruction in the same stage

30  1998 Morgan Kaufmann Publishers Hazard Detection Unit Stall by letting an instruction that won’t write anything go forward

31  1998 Morgan Kaufmann Publishers When we decide to branch, other instructions are in the pipeline! We are predicting branch not taken –need to add hardware for flushing instructions if we are wrong Branch Hazards

32  1998 Morgan Kaufmann Publishers Flushing Instructions

33  1998 Morgan Kaufmann Publishers Improving Performance Try and avoid stalls! E.g., reorder these instructions: lw $t0, 0($t1) lw $t2, 4($t1) sw $t2, 0($t1) sw $t0, 4($t1) Add a branch delay slot –the next instruction after a branch is always executed –rely on compiler to fill the slot with something useful

34  1998 Morgan Kaufmann Publishers More on improving performances Superpipelining: decompose the stage further (not always practical) Superscalar: start more than one instruction in the same cycle (extra coordination required) –CPI can be less than 1 –IPC: instruction per clock cycle Dynamic pipelining: lw $t0, 20($s2) addu $t1, $t0, $t2 sub $s4, $s4, $t3 slti $t5, $s4, 20 –Combine extra hardware resources so later instructions can proceed in parallel. –More complicated pipeline control –More complicated instruction execution model

35  1998 Morgan Kaufmann Publishers Superscalar MIPS Assume two instructions are issued per clock cycle, say one integer ALU operation or branch, the other load or store. Need to fetch and decode 64 bits of instruction Extra resources are required.

36  1998 Morgan Kaufmann Publishers Dynamic Scheduling The hardware performs the scheduling? –hardware tries to find instructions to execute –out of order execution is possible –speculative execution and dynamic branch prediction All modern processors are very complicated –DEC Alpha 21264: 9 stage pipeline, 6 instruction issue –PowerPC and Pentium: branch history table –Compiler technology important