Morgan Kaufmann Publishers The Processor

Slides:

Advertisements

Similar presentations

Morgan Kaufmann Publishers The Processor

Advertisements

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.

Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.

Instruction-Level Parallelism (ILP)

MIPS Pipelined Datapath

ECE 445 – Computer Organization

1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.

Chapter Six Enhancing Performance with Pipelining

1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.

Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.

1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.

ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.

CSE378 Pipelining1 Pipelining Basic concept of assembly line –Split a job A into n sequential subjobs (A 1,A 2,…,A n ) with each A i taking approximately.

Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.

Pipelined Datapath and Control

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 8: MIPS Pipelined.

Computer Organization CS224 Fall 2012 Lesson 28. Pipelining Analogy  Pipelined laundry: overlapping execution l Parallelism improves performance §4.5.

Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.

Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2012 Revised from original slides provided by MKP.

Pipeline Hazards. CS5513 Fall Pipeline Hazards Situations that prevent the next instructions in the instruction stream from executing during its.

CMPE 421 Parallel Computer Architecture

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

Pipelining Example Laundry Example: Three Stages

LECTURE 7 Pipelining. DATAPATH AND CONTROL We started with the single-cycle implementation, in which a single instruction is executed over a single cycle.

Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.

Pipeline Timing Issues

Computer Organization

Stalling delays the entire pipeline

Morgan Kaufmann Publishers

Morgan Kaufmann Publishers The Processor

Morgan Kaufmann Publishers The Processor

Single Clock Datapath With Control

Pipeline Implementation (4.6)

Appendix C Pipeline implementation

Morgan Kaufmann Publishers The Processor

Chapter 4 The Processor Part 4

\course\cpeg323-08F\Topic6b-323

Morgan Kaufmann Publishers The Processor

Chapter 4 The Processor Part 3

Review: MIPS Pipeline Data and Control Paths

Morgan Kaufmann Publishers The Processor

Morgan Kaufmann Publishers The Processor

Chapter 4 The Processor Part 2

Morgan Kaufmann Publishers The Processor

Pipelining review.

Single-cycle datapath, slightly rearranged

Current Design.

Morgan Kaufmann Publishers Enhancing Performance with Pipelining

Pipelining in more detail

\course\cpeg323-05F\Topic6b-323

Pipelining Basic concept of assembly line

Pipeline control unit (highly abstracted)

The Processor Lecture 3.6: Control Hazards

Control unit extension for data hazards

The Processor Lecture 3.5: Data Hazards

Instruction Execution Cycle

Pipeline control unit (highly abstracted)

CS203 – Advanced Computer Architecture

CSC3050 – Computer Architecture

Pipeline Control unit (highly abstracted)

Pipelining Basic concept of assembly line

Control unit extension for data hazards

Pipelining Basic concept of assembly line

Morgan Kaufmann Publishers The Processor

Control unit extension for data hazards

Guest Lecturer: Justin Hsia

MIPS Pipelined Datapath

Pipelining Hazards.

Presentation transcript:

Morgan Kaufmann Publishers The Processor 10 November, 2018 Chapter 4 The Processor Chapter 4 — The Processor

Morgan Kaufmann Publishers 10 November, 2018 Pipeline Hazards Situations that prevent starting the next instruction in the next cycle Structural hazard A required resource is busy hardware cannot support the combination of instructions that we want to execute in the same clock cycle. Data hazard Need to wait for previous instruction to complete its data read/write Control hazard Deciding on control action depends on previous instruction Chapter 4 — The Processor

Morgan Kaufmann Publishers 10 November, 2018 Structural Hazards Conflict for use of a resource In LEGv8 pipeline with a single memory Load/store requires data access Instruction fetch would have to stall for that cycle Would cause a pipeline “bubble” Hence, pipelined datapaths require separate instruction/data memories Or separate instruction/data caches Chapter 4 — The Processor

Morgan Kaufmann Publishers 10 November, 2018 Data Hazards An instruction depends on completion of data access by a previous instruction ADD X19, X0, X1 SUB X2, X19, X3 The add instruction doesn’t write its result until the fifth stage, i.e. have to waste three clock cycles in the pipeline. Chapter 4 — The Processor

Forwarding (aka Bypassing) Morgan Kaufmann Publishers Forwarding (aka Bypassing) 10 November, 2018 Use result when it is computed Don’t wait for it to be stored in a register Requires extra connections in the datapath Forwarding/bypassing: adding extra hardware to retrieve the missing item early from the internal resources Chapter 4 — The Processor

Morgan Kaufmann Publishers Load-Use Data Hazard 10 November, 2018 Can’t always avoid stalls by forwarding If value not computed when needed Can’t forward backward in time! Called pipeline stall or bubble Chapter 4 — The Processor

Code Scheduling to Avoid Stalls Morgan Kaufmann Publishers 10 November, 2018 Code Scheduling to Avoid Stalls Reorder code to avoid use of load result in the next instruction C code for A = B + E; C = B + F; LDUR X1, [X0,#0] LDUR X2, [X0,#8] ADD X3, X1, X2 STUR X3, [X0,#24] LDUR X4, [X0,#16] ADD X5, X1, X4 STUR X5, [X0,#32] LDUR X1, [X0,#0] LDUR X2, [X0,#8] LDUR X4, [X0,#16] ADD X3, X1, X2 STUR X3, [X0,#24] ADD X5, X1, X4 STUR X5, [X0,#32] stall stall 13 cycles 11 cycles Chapter 4 — The Processor

Morgan Kaufmann Publishers 10 November, 2018 Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can’t always fetch correct instruction Still working on ID stage of branch In LEGv8 pipeline Need to compare registers and compute target early in the pipeline Add hardware to do it in ID stage extra hardware can test a register, calculate the branch address, and update the PC during the second stage of the pipeline Chapter 4 — The Processor

Morgan Kaufmann Publishers Stall on Branch 10 November, 2018 Wait until branch outcome determined before fetching next instruction Pipeline showing stalling on every conditional branch as solution to control hazards Chapter 4 — The Processor

Morgan Kaufmann Publishers 10 November, 2018 Branch Prediction Longer pipelines can’t readily determine branch outcome early Stall penalty becomes unacceptable Predict outcome of branch Only stall if prediction is wrong In LEGv8 pipeline Can predict branches not taken Fetch instruction after branch, with no delay Chapter 4 — The Processor

Branch Prediction

More-Realistic Branch Prediction Morgan Kaufmann Publishers 10 November, 2018 More-Realistic Branch Prediction Static branch prediction Conditional branches predicted as taken and some as untaken Based on typical branch behavior Example: loop and if-statement branches Predict backward branches taken Predict forward branches not taken Dynamic branch prediction Hardware measures actual branch behavior e.g., record recent history of each branch as taken or untaken and then use the recent past behavior to predict the future Assume future behavior will continue the trend When wrong, stall while re-fetching, and update history Chapter 4 — The Processor

Morgan Kaufmann Publishers Pipeline Summary 10 November, 2018 The BIG Picture Pipelining improves performance by increasing instruction throughput Executes multiple instructions in parallel Each instruction has the same latency Subject to hazards Structure, data, control Instruction set design affects complexity of pipeline implementation Chapter 4 — The Processor

LEGv8 Pipelined Datapath Morgan Kaufmann Publishers LEGv8 Pipelined Datapath 10 November, 2018 §4.6 Pipelined Datapath and Control PC update leads to control hazards MEM Right-to-left flow leads to hazards WB WB leads to data hazards Chapter 4 — The Processor

Morgan Kaufmann Publishers Pipeline registers 10 November, 2018 Need registers between stages To hold information produced in previous cycle Chapter 4 — The Processor

Morgan Kaufmann Publishers 10 November, 2018 Pipeline Operation Cycle-by-cycle flow of instructions through the pipelined datapath “Single-clock-cycle” pipeline diagram Shows pipeline usage in a single cycle Highlight resources used c.f. “multi-clock-cycle” diagram Graph of operation over time We’ll look at “single-clock-cycle” diagrams for load & store Chapter 4 — The Processor

Morgan Kaufmann Publishers 10 November, 2018 IF for Load, Store, … Instruction is read from memory using the address in the PC and then placed in the IF/ID pipeline register. The PC address is incremented by 4 and then written back into the PC to be ready for the next clock cycle. This incremented address is also saved in the IF/ID pipeline register Chapter 4 — The Processor

Morgan Kaufmann Publishers ID for Load, Store, … 10 November, 2018 The instruction portion of the IF/ID pipeline register supplying the immediate field is sign-extended to 64 bits, and the register numbers to read the two registers. All three values are stored in the ID/EX pipeline register, along with the incremented PC address. Chapter 4 — The Processor

Morgan Kaufmann Publishers EX for Load 10 November, 2018 The load instruction reads the contents of a register and the sign-extended immediate from the ID/EX pipeline register and adds them using the ALU. That sum is placed in the EX/MEM pipeline register. Chapter 4 — The Processor

Morgan Kaufmann Publishers MEM for Load 10 November, 2018 The load instruction reads the data memory using the address from the EX/MEM pipeline register and loads the data into the MEM/WB pipeline register. Chapter 4 — The Processor

Morgan Kaufmann Publishers WB for Load 10 November, 2018 Read the data from the MEM/WB pipeline register and writ it into the register file in the middle of the figure. Wrong register number Chapter 4 — The Processor

Morgan Kaufmann Publishers Pipeline registers 10 November, 2018 Need registers between stages To hold information produced in previous cycle Chapter 4 — The Processor

Morgan Kaufmann Publishers 10 November, 2018 Pipeline Operation Cycle-by-cycle flow of instructions through the pipelined datapath “Single-clock-cycle” pipeline diagram Shows pipeline usage in a single cycle Highlight resources used c.f. “multi-clock-cycle” diagram Graph of operation over time We’ll look at “single-clock-cycle” diagrams for load & store Chapter 4 — The Processor

Morgan Kaufmann Publishers 10 November, 2018 IF for Load, Store, … Instruction is read from memory using the address in the PC and then placed in the IF/ID pipeline register. The PC address is incremented by 4 and then written back into the PC to be ready for the next clock cycle. This incremented address is also saved in the IF/ID pipeline register Chapter 4 — The Processor

Morgan Kaufmann Publishers ID for Load, Store, … 10 November, 2018 The instruction portion of the IF/ID pipeline register supplying the immediate field is sign-extended to 64 bits, and the register numbers to read the two registers. All three values are stored in the ID/EX pipeline register, along with the incremented PC address. Chapter 4 — The Processor

Morgan Kaufmann Publishers EX for Load 10 November, 2018 The load instruction reads the contents of a register and the sign-extended immediate from the ID/EX pipeline register and adds them using the ALU. That sum is placed in the EX/MEM pipeline register. Chapter 4 — The Processor

Morgan Kaufmann Publishers MEM for Load 10 November, 2018 The load instruction reads the data memory using the address from the EX/MEM pipeline register and loads the data into the MEM/WB pipeline register. Chapter 4 — The Processor

Morgan Kaufmann Publishers WB for Load 10 November, 2018 Read the data from the MEM/WB pipeline register and write it into the register file in the middle of the figure. Which instruction supplies the write register number? The instruction in the IF/ID pipeline register supplies the write register number, yet this instruction occurs considerably after the load instruction! Wrong register number Chapter 4 — The Processor

Corrected Datapath for Load Morgan Kaufmann Publishers Corrected Datapath for Load 10 November, 2018 Preserve the destination register number in the load instruction Load must pass the register number from the ID/EX through EX/MEM to the MEM/WB pipeline register for use in the WB stage. In other words, to share the pipelined datapath, we need to preserve the instruction read during the IF stage, so each pipeline register contains a portion of the instruction needed for that stage and later stages. Chapter 4 — The Processor

Corrected Datapath for Load Hardware used in all five stages

Morgan Kaufmann Publishers EX for Store 10 November, 2018 The first two stages – Instruction Fetch and Instruction Decode/ register read are the same as Load instruction In the third stage, the effective address is placed in the EX/MEM pipeline register. Chapter 4 — The Processor

Morgan Kaufmann Publishers MEM for Store 10 November, 2018 The data is written to memory in the fourth stage. The register containing the data to be stored was read in an earlier stage and stored in ID/EX. The only way to make the data available during the MEM stage is to place the data into the EX/MEM pipeline register in the EX stage Chapter 4 — The Processor

Morgan Kaufmann Publishers WB for Store 10 November, 2018 Nothing happens in the write-back stage Chapter 4 — The Processor

Multi-Cycle Pipeline Diagram Morgan Kaufmann Publishers Multi-Cycle Pipeline Diagram 10 November, 2018 Form showing resource usage Chapter 4 — The Processor

Multi-Cycle Pipeline Diagram Morgan Kaufmann Publishers Multi-Cycle Pipeline Diagram 10 November, 2018 Traditional form Chapter 4 — The Processor

Single-Cycle Pipeline Diagram Morgan Kaufmann Publishers Single-Cycle Pipeline Diagram 10 November, 2018 State of pipeline in a given cycle Chapter 4 — The Processor