Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Organization Lecture Set – 06 Chapter 6 Huei-Yung Lin.

Similar presentations


Presentation on theme: "Computer Organization Lecture Set – 06 Chapter 6 Huei-Yung Lin."— Presentation transcript:

1 Computer Organization Lecture Set – 06 Chapter 6 Huei-Yung Lin

2 H.Y. Lin, CCUEE Computer Organization 2 Data Hazards Revisited … Data hazards occur when data is used before it is stored (Fig. 6.28)

3 H.Y. Lin, CCUEE Computer Organization 3 Data Hazard Solution: Forwarding Key idea: connect data internally before it's stored (Fig. 6.29)

4 H.Y. Lin, CCUEE Computer Organization 4 Data Hazard Solution: Forwarding Add hardware to feed back ALU and MEM results to both ALU inputs (Fig. 6.32)

5 H.Y. Lin, CCUEE Computer Organization 5 Controlling Forwarding Need to test when register numbers match in rs, rt, and rd fields stored in pipeline registers "EX" hazard:  EX/MEM – test whether instruction writes register file and examine rd register  ID/EX – test whether instruction reads rs or rt register and matches rd register in EX/MEM "MEM" hazard:  MEM/WB – test whether instruction writes register file and examine rd (rt) register  ID/EX – test whether instruction reads rs or rt register and matches rd (rt) register in EX/MEM

6 H.Y. Lin, CCUEE Computer Organization 6 Forwarding Unit Detail – EX Hazard if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10 if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10

7 H.Y. Lin, CCUEE Computer Organization 7 Forwarding Unit Detail – MEM Hazard if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01

8 H.Y. Lin, CCUEE Computer Organization 8 EX Hazard Complication What if we a register is changed more than once?  add $1, $1, $2;  add $1, $1, $3;  add $1, $1, $4; Answer: forward most recent result (in MEM stage)

9 H.Y. Lin, CCUEE Computer Organization 9 Data Hazards and Stalls We still have to stall when register is loaded from memory and used in following instruction (Fig. 6.34)

10 H.Y. Lin, CCUEE Computer Organization 10 Data Hazards and Stalls Add a hazard detection unit to detect this and stall (Fig. 6.35) Typo: Should read and

11 H.Y. Lin, CCUEE Computer Organization 11 (Fig. 6.36) Pipelined Processor with Hazard Detection

12 H.Y. Lin, CCUEE Computer Organization 12 if (ID/EX.MemRead and ((ID/EX.RegisterRt = IF/ID.RegisterRs) or ((ID/EX.RegisterRt = IF/ID.RegisterRt))) stall Hazard Detection Unit – Control Detail

13 H.Y. Lin, CCUEE Computer Organization 13 Hazard Detection Unit – What Happens MUX zeros out control signals for instruction in ID  “squashes” the instruction  “no-op” (nop) propagates through following stages IF/ID holds stalled instruction until next clock cycle PC holds current value until next clock cycle (re-loads first instruction)

14 H.Y. Lin, CCUEE Computer Organization 14 Branch Hazards Just stalling for each branch is not practical Common assumption: branch not taken When assumption fails: flush three instructions (Fig. 6.37)

15 H.Y. Lin, CCUEE Computer Organization 15 Reducing Branch Delay Key idea: move branch logic to ID stage of pipeline  New adder calculates branch target (PC + 4 + extend(IMM) << 2)  New hardware tests rs == rt after register read  Add flush signal to squash instruction in IF/ID register Reduced penalty (1 cycle) when branch taken Example: Figure 6.38, p. 420

16 H.Y. Lin, CCUEE Computer Organization 16 Pipelining Outline Introduction Pipelined Processor Design  Datapath  Control  Dealing with Hazards & Forwarding  Branch Prediction   Exceptions  Performance Advanced Pipelining  Superscalar  Dynamic Pipelining  Examples

17 H.Y. Lin, CCUEE Computer Organization 17 Branch Prediction Key idea: instead of always assuming branch not taken, use a prediction based on previous history  Branch history table: small memory Index using lower bits instruction address Save “what happened” on last execution  branch taken OR  branch not taken  Use history to make prediction

18 H.Y. Lin, CCUEE Computer Organization 18 More about Branch Prediction Consider nested loops: for (i=1; i<M; i++) oloop:... for (j=1; j<N; j++) { iloop:......... } bne $1,$2, iloop } bne $3,$4, oloop Prediction fails on fast and last branch (Why?) More history can improve performance

19 H.Y. Lin, CCUEE Computer Organization 19 Branch Prediction with 2-Bit History Key idea: must be wrong twice before changing prediction

20 H.Y. Lin, CCUEE Computer Organization 20 Pipelining Outline Introduction Pipelined Processor Design  Datapath  Control  Dealing with Hazards & Forwarding  Branch Prediction  Exceptions   Performance

21 H.Y. Lin, CCUEE Computer Organization 21 Pipelining and Exceptions Exceptions require suspension of execution Complicating factors  Several instructions are in pipeline  Exception may occur before instruction is complete  Must flush pipeline to suspend execution, but may lose information about the exception Exceptions make life difficult – take a computer architecture course to learn more.

22 H.Y. Lin, CCUEE Computer Organization 22 Pipelining Outline Introduction Pipelined Processor Design  Datapath  Control  Dealing with Hazards & Forwarding  Branch Prediction  Exceptions  Performance 

23 H.Y. Lin, CCUEE Computer Organization 23 Use “gcc” instruction mix to calculate CPI lw25%1 cycle (2 cycles when load-use hazard) sw10%1 cycle R-type52%1 cycle branch11%1 cycle (2 when prediction wrong) jump2%2 cycles Assmptions:  50% of load instructions are followed by immed. use  25% of branch predictions are wrong Calculating CPI  CPI = (1.5 cycles * 0.25) + (1 cycle * 0.10) + (1 cycle * 0.52) + (1.25 cycles * 0.11) + (2 cycles * 0.02)  CPI = 1.17 cycles per instruction Performance of the Pipelined Implementation

24 H.Y. Lin, CCUEE Computer Organization 24 Calculate the average execution time: Pipelined1.17 CPI * 200ps/clock= 234ps Single-Cycle 1 CPI * 600ps/clock=600ps Multicycle4.12 CPI * 200ps / clock=824ps Speedup of pipelined implementation  2.56  faster than single cycle  3.4  faster than multicycle “Your mileage may differ” as instruction mix changes Performance of the Pipelined Implementation

25 H.Y. Lin, CCUEE Computer Organization 25 References Portions of these slides are derived from:  Textbook figures © 1998 Morgan Kaufmann Publishers all rights reserved  Tod Amon's COD2e Slides © 1998 Morgan Kaufmann Publishers all rights reserved  Dave Patterson’s CS 152 Slides – Fall 1997 © UCB  Rob Rutenbar’s 18-347 Slides – Fall 1999 CMU  John Nestor’s ECE 313 Slides – Fall 2004 LC  T.S. Chang’s DEE 1050 Slides – Fall 2004 NCTU  Other sources as noted


Download ppt "Computer Organization Lecture Set – 06 Chapter 6 Huei-Yung Lin."

Similar presentations


Ads by Google