CSCE 212 Chapter 6 Enhancing Performance with Pipelining Instructor: Jason D. Bakos.

Slides:



Advertisements
Similar presentations
Lecture 4: CPU Performance
Advertisements

Morgan Kaufmann Publishers The Processor
1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
Lecture Objectives: 1)Define branch prediction. 2)Draw a state machine for a 2 bit branch prediction scheme 3)Explain the impact on the compiler of branch.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.
Instruction-Level Parallelism (ILP)
Sample Undergraduate Lecture: MIPS Instruction Set Architecture Jason D. Bakos Optics/Microelectronics Lab Department of Computer Science University of.
Pipelined Processor II (cont’d) CPSC 321
1 IF IDEX MEM L.D F4,0(R2) MUL.D F0, F4, F6 ADD.D F2, F0, F8 L.D F2, 0(R2) WB IF IDM1 MEM WBM2M3M4M5M6M7 stall.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipeline Hazards See: P&H Chapter 4.7.
Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.
Pipelining III Andreas Klappenecker CPSC321 Computer Architecture.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
1 Chapter Six - 2nd Half Pipelined Processor Forwarding, Hazards, Branching EE3055 Web:
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
CSCE 212 Quiz 9 – 3/30/11 1.What is the clock cycle time based on for single-cycle and for pipelining? 2.What two actions can be done to resolve data hazards?
King Fahd University of Petroleum and Minerals King Fahd University of Petroleum and Minerals Computer Engineering Department Computer Engineering Department.
Appendix A Pipelining: Basic and Intermediate Concepts
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
Abstraction Question General purpose processors have an abstraction layer fixed at the ISA and have little control over the compilers or code run on the.
Memory/Storage Architecture Lab Computer Architecture Pipelining Basics.
Automobile Manufacturing 1. Build frame. 60 min. 2. Add engine. 50 min. 3. Build body. 80 min. 4. Paint. 40 min. 5. Finish.45 min. 275 min. Latency: Time.
1 Appendix A Pipeline implementation Pipeline hazards, detection and forwarding Multiple-cycle operations MIPS R4000 CDA5155 Spring, 2007, Peir / University.
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 8: MIPS Pipelined.
Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
11/13/2015 8:57 AM 1 of 86 Pipelining Chapter 6. 11/13/2015 8:57 AM 2 of 86 Overview of Pipelining Pipelining is an implementation technique in which.
Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.
CMPE 421 Parallel Computer Architecture
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
Winter 2002CSE Topic Branch Hazards in the Pipelined Processor.
5/13/99 Ashish Sabharwal1 Pipelining and Hazards n Hazards occur because –Don’t have enough resources (ALU’s, memory,…) Structural Hazard –Need a value.
Computing Systems Pipelining: enhancing performance.
Branch Hazards and Static Branch Prediction Techniques
1/24/ :00 PM 1 of 86 Pipelining Chapter 6. 1/24/ :00 PM 2 of 86 Overview of Pipelining Pipelining is an implementation technique in which.
Pipelining Example Laundry Example: Three Stages
Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
Introduction to Computer Organization Pipelining.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,
Out-of-order execution Lihu Rappoport 11/ MAMAS – Computer Architecture Out-Of-Order Execution Dr. Lihu Rappoport.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
Computer Organization CS224
Stalling delays the entire pipeline
CDA3101 Recitation Section 8
Pipelining Chapter 6.
CSCI206 - Computer Organization & Programming
Morgan Kaufmann Publishers
Instructor: Justin Hsia
Single Clock Datapath With Control
Pipeline Implementation (4.6)
Chapter 4 The Processor Part 4
Pipelining: Advanced ILP
Morgan Kaufmann Publishers The Processor
Pipelining review.
Pipelining Chapter 6.
The processor: Pipelining and Branching
Lecture 9. MIPS Processor Design – Pipelined Processor Design #2
Pipelining in more detail
CSCI206 - Computer Organization & Programming
CSCI206 - Computer Organization & Programming
Pipeline Control unit (highly abstracted)
Presentation transcript:

CSCE 212 Chapter 6 Enhancing Performance with Pipelining Instructor: Jason D. Bakos

CSCE Pipelining

CSCE MIPS Pipeline Basic idea: –Execute multiple instructions in parallel –Split instruction execution into 5 stages –Instructions execute in “assembly-line” PCRegFile control ALU fetchdecodeexecutememorywrite back address MemoryDataIn rs/rt A instruction register op/func 4 SE/imm SE/imm*4 B SHAMT MemRead MemWrite Address MemoryOut MemoryIn R register control for: memory/wb rs/rt/rd ctrl/NOOP R A, B registers control for: execute/memory/wb rs/rt/rd MDR register control for: wb rs/rt/rd

CSCE Pipelined MIPS

CSCE Pipelined MIPS

CSCE Pipelined Control

CSCE Pipelined Control

CSCE Pipelined Control

CSCE MIPS ISA MIPS pipeline stages –Fetch (F) read next instruction from memory, increment address counter assume 1 cycle to access memory –Decode (D) read register operands, resolve instruction in control signals, compute branch target –Execute (E) execute arithmetic/resolve branches –Memory (M) perform load/store accesses to memory, take branches assume 1 cycle to access memory –Write back (W) write arithmetic results to register file

CSCE Hazards Hazards are data flow problems that arise as a result of pipelining –Limits the amount of parallelism, sometimes induces “penalties” that prevent one instruction per clock cycle –Structural hazards Two operations require a single piece of hardware Structural hazards can be overcome by adding additional hardware –Control hazards Conditional control instructions are not resolved until late in the pipeline, requiring subsequent instruction fetches to be predicted –Flushed if prediction does not hold (make sure no state change) Branchhazards can use dynamic prediction/speculation, branch delay slot –Data hazards Instruction from one pipeline stage is “dependant” of data computed in another pipeline stage

CSCE Hazards Data hazards –Register values “read” in decode, written during write-back RAW hazard occurs when dependent inst. separated by less than 2 slots Examples: –ADD $2,$X,$X(E)ADD $2,$X,$X (M)ADD $2,$3,$4 (W) –ADD $X,$2,$X(D)…… –…ADD $X,$2,$X (D)… –……ADD $X,$2,$3 (D) –In most cases, data generated in same stage as data is required (EX) Data forwarding –ADD $2,$X,$X(M)ADD $2,$X,$X (W)ADD $2,$3,$4 (out-of-pipe) –ADD $X,$2,$X(E)…… –…ADD $X,$2,$X (E)… –……ADD $X,$2,$3 (E)

CSCE “Load” Hazards Stalls required when data is not produced in same stage as it is needed for a subsequent instruction –Example: LW $2, 0($X) (M) ADD $X, $2(E) When this occurs, insert a “bubble” into EX state, stall F and D LW $2, 0($X) (W) NOOP (M) ADD $X, $2 (E) –Forward from W to E

CSCE Data Hazards: Forwarding

CSCE Data Hazards: Stalling for Load Hazard

CSCE Control Hazards Need to make a branch decision based on data that has yet to be produced: –add $2,$3,$4 –beqz $2,loop Which stage is branch resolved? Approaches: –stall insert bubbles after all branches –always predict untaken if taken, instructions entering DEC and EX (and MEM?) transfer as NOOPs –branch delay slot instruction following branch is always executed –dynamic branch predictors

CSCE Control Hazards Instructions are fetched every clock cycle Branch decisions happen in the EX stage Solutions: –Assume branch not taken (performs a flush of IF, ID, EX by inserting a nop into the pipeline registers on the clock edge) –Reduce the delay by moving the branch decision up Requires additional hardware (comparators, etc.) –Might increase cycle time, since register read and resolution are now in series and must be performed in half a cycle to allow for parallel register writes! Requires forwarding and stall hardware for new data hazards

CSCE Example add $6,$5,$2 lw $7,0($6) addi $7,$7,10 add $6,$4,$2 sw $7,0($6) addi $2,$2,4 blt $2,$3,loop add $6,$5,$2 FDEMW FDEMW FD EMW FDEMW FDEMW FDEMW FDEMW 13 FDEMW instructions, cycles, CPI = 11/8

CSCE Moving up Branch Resolution

CSCE Moving up Branch Resolution

CSCE Scheduling the Branch Delay Slot

CSCE Dynamic Branch Prediction Assume taken/not-taken (static) –Loops have branches that are usually taken When wrong, we flush pipeline stages Deeper pipelines have higher branch penalties (misprediction penalty) Solution: –Look up address of branch to check if branch was previously taken –One-bit schemes –Two-bit schemes (must be wrong twice to change prediction)