Download presentation
Presentation is loading. Please wait.
1
Morgan Kaufmann Publishers The Processor
10 November, 2018 Chapter 4 The Processor Chapter 4 — The Processor
2
Morgan Kaufmann Publishers
10 November, 2018 Pipeline Hazards Situations that prevent starting the next instruction in the next cycle Structural hazard A required resource is busy hardware cannot support the combination of instructions that we want to execute in the same clock cycle. Data hazard Need to wait for previous instruction to complete its data read/write Control hazard Deciding on control action depends on previous instruction Chapter 4 — The Processor
3
Morgan Kaufmann Publishers
10 November, 2018 Structural Hazards Conflict for use of a resource In LEGv8 pipeline with a single memory Load/store requires data access Instruction fetch would have to stall for that cycle Would cause a pipeline “bubble” Hence, pipelined datapaths require separate instruction/data memories Or separate instruction/data caches Chapter 4 — The Processor
4
Morgan Kaufmann Publishers
10 November, 2018 Data Hazards An instruction depends on completion of data access by a previous instruction ADD X19, X0, X1 SUB X2, X19, X3 The add instruction doesn’t write its result until the fifth stage, i.e. have to waste three clock cycles in the pipeline. Chapter 4 — The Processor
5
Forwarding (aka Bypassing)
Morgan Kaufmann Publishers Forwarding (aka Bypassing) 10 November, 2018 Use result when it is computed Don’t wait for it to be stored in a register Requires extra connections in the datapath Forwarding/bypassing: adding extra hardware to retrieve the missing item early from the internal resources Chapter 4 — The Processor
6
Morgan Kaufmann Publishers
Load-Use Data Hazard 10 November, 2018 Can’t always avoid stalls by forwarding If value not computed when needed Can’t forward backward in time! Called pipeline stall or bubble Chapter 4 — The Processor
7
Code Scheduling to Avoid Stalls
Morgan Kaufmann Publishers 10 November, 2018 Code Scheduling to Avoid Stalls Reorder code to avoid use of load result in the next instruction C code for A = B + E; C = B + F; LDUR X1, [X0,#0] LDUR X2, [X0,#8] ADD X3, X1, X2 STUR X3, [X0,#24] LDUR X4, [X0,#16] ADD X5, X1, X4 STUR X5, [X0,#32] LDUR X1, [X0,#0] LDUR X2, [X0,#8] LDUR X4, [X0,#16] ADD X3, X1, X2 STUR X3, [X0,#24] ADD X5, X1, X4 STUR X5, [X0,#32] stall stall 13 cycles 11 cycles Chapter 4 — The Processor
8
Morgan Kaufmann Publishers
10 November, 2018 Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can’t always fetch correct instruction Still working on ID stage of branch In LEGv8 pipeline Need to compare registers and compute target early in the pipeline Add hardware to do it in ID stage extra hardware can test a register, calculate the branch address, and update the PC during the second stage of the pipeline Chapter 4 — The Processor
9
Morgan Kaufmann Publishers
Stall on Branch 10 November, 2018 Wait until branch outcome determined before fetching next instruction Pipeline showing stalling on every conditional branch as solution to control hazards Chapter 4 — The Processor
10
Morgan Kaufmann Publishers
10 November, 2018 Branch Prediction Longer pipelines can’t readily determine branch outcome early Stall penalty becomes unacceptable Predict outcome of branch Only stall if prediction is wrong In LEGv8 pipeline Can predict branches not taken Fetch instruction after branch, with no delay Chapter 4 — The Processor
11
Branch Prediction
12
More-Realistic Branch Prediction
Morgan Kaufmann Publishers 10 November, 2018 More-Realistic Branch Prediction Static branch prediction Conditional branches predicted as taken and some as untaken Based on typical branch behavior Example: loop and if-statement branches Predict backward branches taken Predict forward branches not taken Dynamic branch prediction Hardware measures actual branch behavior e.g., record recent history of each branch as taken or untaken and then use the recent past behavior to predict the future Assume future behavior will continue the trend When wrong, stall while re-fetching, and update history Chapter 4 — The Processor
13
Morgan Kaufmann Publishers
Pipeline Summary 10 November, 2018 The BIG Picture Pipelining improves performance by increasing instruction throughput Executes multiple instructions in parallel Each instruction has the same latency Subject to hazards Structure, data, control Instruction set design affects complexity of pipeline implementation Chapter 4 — The Processor
14
LEGv8 Pipelined Datapath
Morgan Kaufmann Publishers LEGv8 Pipelined Datapath 10 November, 2018 §4.6 Pipelined Datapath and Control PC update leads to control hazards MEM Right-to-left flow leads to hazards WB WB leads to data hazards Chapter 4 — The Processor
15
Morgan Kaufmann Publishers
Pipeline registers 10 November, 2018 Need registers between stages To hold information produced in previous cycle Chapter 4 — The Processor
16
Morgan Kaufmann Publishers
10 November, 2018 Pipeline Operation Cycle-by-cycle flow of instructions through the pipelined datapath “Single-clock-cycle” pipeline diagram Shows pipeline usage in a single cycle Highlight resources used c.f. “multi-clock-cycle” diagram Graph of operation over time We’ll look at “single-clock-cycle” diagrams for load & store Chapter 4 — The Processor
17
Morgan Kaufmann Publishers
10 November, 2018 IF for Load, Store, … Instruction is read from memory using the address in the PC and then placed in the IF/ID pipeline register. The PC address is incremented by 4 and then written back into the PC to be ready for the next clock cycle. This incremented address is also saved in the IF/ID pipeline register Chapter 4 — The Processor
18
Morgan Kaufmann Publishers
ID for Load, Store, … 10 November, 2018 The instruction portion of the IF/ID pipeline register supplying the immediate field is sign-extended to 64 bits, and the register numbers to read the two registers. All three values are stored in the ID/EX pipeline register, along with the incremented PC address. Chapter 4 — The Processor
19
Morgan Kaufmann Publishers
EX for Load 10 November, 2018 The load instruction reads the contents of a register and the sign-extended immediate from the ID/EX pipeline register and adds them using the ALU. That sum is placed in the EX/MEM pipeline register. Chapter 4 — The Processor
20
Morgan Kaufmann Publishers
MEM for Load 10 November, 2018 The load instruction reads the data memory using the address from the EX/MEM pipeline register and loads the data into the MEM/WB pipeline register. Chapter 4 — The Processor
21
Morgan Kaufmann Publishers
WB for Load 10 November, 2018 Read the data from the MEM/WB pipeline register and writ it into the register file in the middle of the figure. Wrong register number Chapter 4 — The Processor
22
Morgan Kaufmann Publishers
Pipeline registers 10 November, 2018 Need registers between stages To hold information produced in previous cycle Chapter 4 — The Processor
23
Morgan Kaufmann Publishers
10 November, 2018 Pipeline Operation Cycle-by-cycle flow of instructions through the pipelined datapath “Single-clock-cycle” pipeline diagram Shows pipeline usage in a single cycle Highlight resources used c.f. “multi-clock-cycle” diagram Graph of operation over time We’ll look at “single-clock-cycle” diagrams for load & store Chapter 4 — The Processor
24
Morgan Kaufmann Publishers
10 November, 2018 IF for Load, Store, … Instruction is read from memory using the address in the PC and then placed in the IF/ID pipeline register. The PC address is incremented by 4 and then written back into the PC to be ready for the next clock cycle. This incremented address is also saved in the IF/ID pipeline register Chapter 4 — The Processor
25
Morgan Kaufmann Publishers
ID for Load, Store, … 10 November, 2018 The instruction portion of the IF/ID pipeline register supplying the immediate field is sign-extended to 64 bits, and the register numbers to read the two registers. All three values are stored in the ID/EX pipeline register, along with the incremented PC address. Chapter 4 — The Processor
26
Morgan Kaufmann Publishers
EX for Load 10 November, 2018 The load instruction reads the contents of a register and the sign-extended immediate from the ID/EX pipeline register and adds them using the ALU. That sum is placed in the EX/MEM pipeline register. Chapter 4 — The Processor
27
Morgan Kaufmann Publishers
MEM for Load 10 November, 2018 The load instruction reads the data memory using the address from the EX/MEM pipeline register and loads the data into the MEM/WB pipeline register. Chapter 4 — The Processor
28
Morgan Kaufmann Publishers
WB for Load 10 November, 2018 Read the data from the MEM/WB pipeline register and write it into the register file in the middle of the figure. Which instruction supplies the write register number? The instruction in the IF/ID pipeline register supplies the write register number, yet this instruction occurs considerably after the load instruction! Wrong register number Chapter 4 — The Processor
29
Corrected Datapath for Load
Morgan Kaufmann Publishers Corrected Datapath for Load 10 November, 2018 Preserve the destination register number in the load instruction Load must pass the register number from the ID/EX through EX/MEM to the MEM/WB pipeline register for use in the WB stage. In other words, to share the pipelined datapath, we need to preserve the instruction read during the IF stage, so each pipeline register contains a portion of the instruction needed for that stage and later stages. Chapter 4 — The Processor
30
Corrected Datapath for Load
Hardware used in all five stages
31
Morgan Kaufmann Publishers
EX for Store 10 November, 2018 The first two stages – Instruction Fetch and Instruction Decode/ register read are the same as Load instruction In the third stage, the effective address is placed in the EX/MEM pipeline register. Chapter 4 — The Processor
32
Morgan Kaufmann Publishers
MEM for Store 10 November, 2018 The data is written to memory in the fourth stage. The register containing the data to be stored was read in an earlier stage and stored in ID/EX. The only way to make the data available during the MEM stage is to place the data into the EX/MEM pipeline register in the EX stage Chapter 4 — The Processor
33
Morgan Kaufmann Publishers
WB for Store 10 November, 2018 Nothing happens in the write-back stage Chapter 4 — The Processor
34
Multi-Cycle Pipeline Diagram
Morgan Kaufmann Publishers Multi-Cycle Pipeline Diagram 10 November, 2018 Form showing resource usage Chapter 4 — The Processor
35
Multi-Cycle Pipeline Diagram
Morgan Kaufmann Publishers Multi-Cycle Pipeline Diagram 10 November, 2018 Traditional form Chapter 4 — The Processor
36
Single-Cycle Pipeline Diagram
Morgan Kaufmann Publishers Single-Cycle Pipeline Diagram 10 November, 2018 State of pipeline in a given cycle Chapter 4 — The Processor
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.