Instruction-Level Parallelism (ILP)

Slides:



Advertisements
Similar presentations
Morgan Kaufmann Publishers The Processor
Advertisements

ILP: IntroductionCSCE430/830 Instruction-level parallelism: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.
1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
COMP25212 Further Pipeline Issues. Cray 1 COMP25212 Designed in 1976 Cost $8,800,000 8MB Main Memory Max performance 160 MFLOPS Weight 5.5 Tons Power.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.
1 IF IDEX MEM L.D F4,0(R2) MUL.D F0, F4, F6 ADD.D F2, F0, F8 L.D F2, 0(R2) WB IF IDM1 MEM WBM2M3M4M5M6M7 stall.
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)
Chapter 12 Pipelining Strategies Performance Hazards.
EECC551 - Shaaban #1 Spring 2006 lec# Pipelining and Instruction-Level Parallelism. Definition of basic instruction block Increasing Instruction-Level.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
Chapter 2 Instruction-Level Parallelism and Its Exploitation
DLX Instruction Format
1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)
EECC551 - Shaaban #1 Spring 2004 lec# Definition of basic instruction blocks Increasing Instruction-Level Parallelism & Size of Basic Blocks.
1 Lecture 4: Advanced Pipelines Control hazards, multi-cycle in-order pipelines, static ILP (Appendix A.4-A.10, Sections )
Appendix A Pipelining: Basic and Intermediate Concepts
Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
CSE378 Pipelining1 Pipelining Basic concept of assembly line –Split a job A into n sequential subjobs (A 1,A 2,…,A n ) with each A i taking approximately.
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
Pipeline Hazard CT101 – Computing Systems. Content Introduction to pipeline hazard Structural Hazard Data Hazard Control Hazard.
Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.
1 Appendix A Pipeline implementation Pipeline hazards, detection and forwarding Multiple-cycle operations MIPS R4000 CDA5155 Spring, 2007, Peir / University.
Pipelined Datapath and Control
Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.
CMPE 421 Parallel Computer Architecture
EECE 476: Computer Architecture Slide Set #5: Implementing Pipelining Tor Aamodt Slide background: Die photo of the MIPS R2000 (first commercial MIPS microprocessor)
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
Branch Hazards and Static Branch Prediction Techniques
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
Pipelining Example Laundry Example: Three Stages
Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
Csci 136 Computer Architecture II – Superscalar and Dynamic Pipelining Xiuzhen Cheng
Out-of-order execution Lihu Rappoport 11/ MAMAS – Computer Architecture Out-Of-Order Execution Dr. Lihu Rappoport.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
Single Clock Datapath With Control
Pipeline Implementation (4.6)
Appendix C Pipeline implementation
CDA 3101 Spring 2016 Introduction to Computer Organization
\course\cpeg323-08F\Topic6b-323
ECE232: Hardware Organization and Design
Pipelining: Advanced ILP
Chapter 4 The Processor Part 3
Morgan Kaufmann Publishers The Processor
Lecture 6: Advanced Pipelines
Pipelining review.
Pipelining in more detail
CSC 4250 Computer Architectures
\course\cpeg323-05F\Topic6b-323
Pipelining Basic concept of assembly line
How to improve (decrease) CPI
Pipeline control unit (highly abstracted)
The Processor Lecture 3.6: Control Hazards
Control unit extension for data hazards
Instruction Execution Cycle
Pipeline control unit (highly abstracted)
pipelining: data hazards Prof. Eric Rotenberg
Pipeline Control unit (highly abstracted)
Pipelining Basic concept of assembly line
Control unit extension for data hazards
Pipelining Basic concept of assembly line
Introduction to Computer Organization and Architecture
Control unit extension for data hazards
Pipelining Hazards.
Presentation transcript:

Instruction-Level Parallelism (ILP) Fine-grained parallelism Obtained by: instruction overlap in a pipeline executing instructions in parallel (later, with multiple instruction issue) In contrast to: loop-level parallelism (medium-grained) process-level or task-level or thread-level parallelism (coarse- grained) Spring 2003 CSE P548

Instruction-Level Parallelism (ILP) Can be exploited when instruction operands are independent of each other, for example, two instructions are independent if their operands are different an example of independent instructions Each thread (program) has very little ILP want to increase it important for executing instructions in parallel and hiding latencies ld R1, 0(R2) or R7, R3, R8 Spring 2003 CSE P548

Dependences data dependence: arises from the flow of values through programs consumer instruction gets a value from a producer instruction determines the order in which instructions can be executed name dependence: instructions use the same register but no flow of data between them antidependence output dependence ld R1, 32(R3) add R3, R1, R8 ld R1, 32(R3) add R3, R1, R8 ld R1, 16(R3) Spring 2003 CSE P548

Dependences control dependence arises from the flow of control instructions after a branch depend on the value of the branch’s condition variable Dependences inhibit ILP   beqz R2, target lw r1, 0(r3) target: add r1, ... Spring 2003 CSE P548

Pipelining Implementation technique (but it is visible to the architecture) overlaps execution of different instructions execute all steps in the execution cycle simultaneously, but on different instructions Exploits ILP by executing several instructions “in parallel” Goal is to increase instruction throughput Spring 2003 CSE P548

Pipelining Spring 2003 CSE P548

Pipelining Not that simple! pipeline hazards (structural, data, control) place a soft “limit” on the number of stages increase instruction latency (a little) write & read pipeline registers for data that is computed in a stage time for clock & control lines to reach all stages all stages are the same length which is determined by the longest stage stage length determines clock cycle time IBM Stretch (1961): the first general-purpose pipelined computer Spring 2003 CSE P548

Hazards Structural hazards Data hazards Control hazards What happens on a hazard instruction that caused the hazard & previous instructions complete all subsequent instructions stall until the hazard is removed (in-order execution) instructions that depend on that instruction stall (out-of-order execution) Spring 2003 CSE P548

Structural Hazards Cause: instructions in different stages want to use the same resource in the same cycle e.g., 4 FP instructions ready to execute & only 2 FP units Solutions: more hardware (eliminate the hazard) stall (so still execute correct programs) less hardware, lower cost only for big hardware components Spring 2003 CSE P548

Spring 2003 CSE P548

Data Hazards Cause: an instruction early in the pipeline needs the result produced by an instruction farther down the pipeline before it is written to a register would not have occurred if the implementation was not pipelined Types RAW (data: flow), WAR (name: antidependence), WAW (name: output) HW solutions forwarding hardware (eliminate the hazard) stall via pipelined interlocks Compiler solution code scheduling (for loads) Spring 2003 CSE P548

Dependences vs. Hazards Spring 2003 CSE P548

Forwarding Forwarding (also called bypassing): output of one stage (the result in that stage’s pipeline register) is bused (bypassed) to the input of a previous stage why forwarding is possible results are computed 1 or more stages before they are written to a register at the end of the EX stage for computational instructions at the end of MEM for a load results are used 1 or more stages after registers are read if you forward a result to an ALU input as soon as it has been computed, you can eliminate the hazard or reduce stalling Spring 2003 CSE P548

Forwarding Example Spring 2003 CSE P548

Forwarding Implementation Forwarding unit checks to see if values must be forwarded: between instructions in ID and EX compare the R-type destination register number in EX/MEM pipeline register to each source register number in ID/EX between instructions in ID and MEM compare the R-type destination register number in MEM/WB to each source register number in ID/EX If a match, then forward the appropriate result values to an ALU source bus a value from EX/MEM or MEM/WB to an ALU source Spring 2003 CSE P548

Spring 2003 CSE P548

Forwarding Hardware Hardware to implement forwarding: destination register number in pipeline registers (but need it anyway because we need to know which register to write when storing an ALU or load result) source register numbers (probably only one, e.g., rs on MIPS R2/3000) is extra) a comparator for each source-destination register pair buses to ship data and register numbers - the BIG cost larger ALU MUXes for 2 bypass values Spring 2003 CSE P548

Loads Loads data hazard caused by a load instruction & an immediate use of the loaded value forwarding won’t eliminate the hazard why? data not back from memory until the end of the MEM stage 2 solutions used together stall via pipelined interlocks schedule independent instructions into the load delay slot (a pipeline hazard that is exposed to the compiler) so that there will be no stall Spring 2003 CSE P548

Loads Spring 2003 CSE P548

Implementing Pipelined Interlocks Detecting a stall situation Hazard detection unit stalls the use after a load does the destination register number of the load = either source register number in the next instruction? compare the load write register number in ID/EX to each read register number in IF/ID is the instruction in EX a load?  if yes, stall the pipe 1 cycle Spring 2003 CSE P548

Implementing Pipelined Interlocks How stalling is implemented: nullify the instruction in the ID stage, the one that uses the loaded value change EX, MEM, WB control signals in ID/EX pipeline register to 0 the instruction in the ID stage will have no side effects as it passes down the pipeline repeat the instructions in ID & IF stages disable writing the PC --- the same instruction will be fetched again disable writing the IF/ID pipeline register --- the load use instruction will be decoded & its registers read again Spring 2003 CSE P548

Loads hazard detection decode again fetch again Spring 2003 CSE P548

Implementing Pipelined Interlocks Hardware to implement stalling: rt register number in ID/EX pipeline register (but need it anyway because we need to know what register to write when storing load data) both source register numbers in IF/ID pipeline register (already there) a comparator for each source-destination register pair buses to ship register numbers write enable/disable for PC write enable/disable for the IF/ID pipeline register a MUX to the ID/EX pipeline register (+ 0s) Trivial amount of hardware & needed for cache misses anyway Spring 2003 CSE P548

Control Hazards Cause: condition & target determined after next fetch Early HW solutions stall assume an outcome & flush pipeline if wrong move branch resolution hardware forward in the pipeline Compiler solutions code scheduling static branch prediction Today’s HW solutions dynamic branch prediction Spring 2003 CSE P548