Presentation is loading. Please wait.

Presentation is loading. Please wait.

EECS 322 April 10, 2000 EECS 322 Computer Architecture Pipeline Control, Data Hazards and Branch Hazards.

Similar presentations


Presentation on theme: "EECS 322 April 10, 2000 EECS 322 Computer Architecture Pipeline Control, Data Hazards and Branch Hazards."— Presentation transcript:

1 EECS 322 April 10, 2000 EECS 322 Computer Architecture Pipeline Control, Data Hazards and Branch Hazards

2 EECS 322 April 10, 2000 Models Single-cycle model (non-overlapping) Each instruction executes in a single cycle Every instruction and clock-cycle must be stretched to the slowest instruction (p.438) Pipeline model (overlapping) Each instruction executes in several cycles The clock-cycle must be stretched to the slowest step Gains efficiency by overlapping the execution of multiple instructions, increasing hardware utilization. (p. 377) Multi-cycle model (non-overlapping) Each instruction executes in variable number of cycles The clock-cycle must be stretched to the slowest step Ability to share functional units within the execution of a single instruction

3 EECS 322 April 10, 2000 Overhead Single-cycle model 8 ns Clock (125 MHz), (non-overlapping) 1 ALU + 2 adders 0 Muxes 0 Datapath Register bits (Flip-Flops) Multi-cycle model 2 ns Clock (500 MHz), (non-overlapping) 1 ALU + Controller 5 Muxes 160 Datapath Register bits (Flip-Flops) Pipeline model 2 ns Clock (500 MHz), (overlapping) 2 ALU + Controller 4 Muxes 373 Datapath + 16 Controlpath Register bits (Flip-Flops) Chip AreaSpeed

4 EECS 322 April 10, 2000 Figure 6.25 PC 32 bits PC 32 bits IR 32 bits IR 32 bits PC3 2 A 32 B 32 Si 32 RT 5 RD 5 PC 32 PC 32 Z1Z1 Z1Z1 ALU Out 32 B 32 D5D5 D5D5 M D R 32 M D R 32 ALU Out 32 D5D5 D5D5 Datapath Registers + 213 FFs 160 FFs 2 W 3 M 4 EX 2 W 3 M 2 W + 16 FFs

5 EECS 322 April 10, 2000 Figure 6.29 Pipeline Control: Controlpath Register bits 9 control bits 5 control bits 2 control bits

6 EECS 322 April 10, 2000 Figure 5.20, Single Cycle Pipeline Control: Controlpath Register bits Instruction R-format lw sw beq Reg Dst 1 1 X X ALU Src 0 1 1 0 Mem Reg 0 1 X X Reg Wrt 1 1 0 0 Mem Red 0 1 0 0 Mem Wrt 0 0 1 0 Bra- nch 0 0 0 1 ALU op1 1 0 0 0 ALU op0 0 0 0 1 Figure 6.28 Instruction R-format lw sw beq Reg Dst 1 1 X X ALU Op1 1 0 0 0 ALU Op0 0 0 0 1 ALU Src 0 1 1 0 Bra- nch 0 0 0 1 Mem Red 0 1 0 0 Mem Wrt 0 0 1 0 Reg Wrt 1 1 0 0 Mem Reg 0 1 X X ID / EX control lines EX / MEM control lines MEM / WB cntrl lines

7 EECS 322 April 10, 2000 Pipeline Hazards Pipeline hazards Solution #1 always works: stall, delay & procrastinate! Structural Hazards (i.e. fetching same memory bank) Solution #2: partition architecture Control Hazards (i.e. branching) Solution #1: stall! but decreases throughput Solution #2: guess and back-track Solution #3: delayed decision: delay branch & fill slot Data Hazards (i.e. register dependencies) Worst case situation Solution #2: re-order instructions Solution #3: forwarding or bypassing: delayed load

8 EECS 322 April 10, 2000 Figure 6.30 Pipeline Datapath and Controlpath

9 EECS 322 April 10, 2000 Figure 6.30 load inst. PC=4 IR= lw $10,20($1) Clock 1 PC=4+20<<2 MDR=Mem[20+C$1] D=$10 WB=11 Aluout Clock 3 PC=0 Clock 0 PC=4 A=C$1 B=X S=20 T=$10 D=0 WB=11, M=010 EX=0001 Clock 2 PC=4+20<<2 ALU=20+C$1 D=$10 Clock 3 WB=11 M=010

10 EECS 322 April 10, 2000 Figure 6.30 load inst. PC=4 IR= lw $10,20($1) Clock 1 PC=4+20<<2 MDR=Mem[20+C$1] D=$10 WB=11 Aluout Clock 3 PC=0 Clock 0 PC=4 A=C$1 B=X S=20 T=$10 D=0 WB=11, M=010 EX=0001 Clock 2 PC=4+20<<2 ALU=20+C$1 D=$10 Clock 3 WB=11 M=010

11 EECS 322 April 10, 2000 Pipeline Datapath and Controlpath Clock 0 1 2 3 Contents of Register 1 = C$1 = 3; C$2=4; C$3=4; C$4=6; C$5=7; C$10=8; … Memory[23]=9; Formats:add $rd,$rs=A,$rt=B; lw $rt=B,@($rs=A) 4 5

12 EECS 322 April 10, 2000 Clock 1: Figure 6.31a PC=4 IR=lw $10,20($1) PC=0

13 EECS 322 April 10, 2000 PC=4 IR=lw $10,20($1) Figure 6.31b PC=4 A=C$1 B=X S=20 T=$10 D=0 PC=8 IR=sub $11, $2, $3 PC=4 C C

14 EECS 322 April 10, 2000 PC=8 IR=sub $11, $2, $3 PC=4 A=C$1 B=X S=20 T=$10 D=0 C C Figure 6.32a PC=8 PC=12 IR=and $12,$4,$5 PC=8 A=C$2 B=C$3 S=X T=$3 D=$11 C C PC=4+20<<2 ALU=20+C$1 D=$10 C C

15 EECS 322 April 10, 2000 PC=12 IR=and $12,$4,$5 PC=8 A=C$2 B=C$3 S=X T=$3 D=$11 C C PC=4+20<<2 ALU=20+C$1 D=$10 C C Clock 4: Figure 6.32b PC=4+20<<2 MDR=Mem[20+C$1] D=$10 C C ALU PC=X ALU=C$2-C$3 D=$11 C C PC=12 A=C$4 B=C$5 S=X T=$3 D=$12 C C PC=16 IR=or $13,$6,$7 PC=20

16 EECS 322 April 10, 2000 Figure 6.36 Data Dependencies: that can be resolved by forwarding Data Hazards Resolved by forwarding At same time: Not a hazard Forward in time: Not a hazard Data Dependencies

17 EECS 322 April 10, 2000 Data Hazards: arithmetic Forwards in time: Can be resolved At same time: Not a hazard Figure 6.37

18 EECS 322 April 10, 2000 Data Dependencies: no forwarding sub $2,$1,$3 and $12,$2,$5 Clock IF 1 1 6 6 EX 7 7 M 8 8 WB ID IF 2 2 MIPS = Clock = 500 Mhz = 167 MIPS CPI 3 WB 5 5 ID Write 1st Half Read 2nd Half 3 3 EX Stall ID M 4 4 Stall ID Suppose every instruction is dependant = 1 + 2 stalls = 3 clocks

19 EECS 322 April 10, 2000 Data Dependencies: no forwarding MIPS = Clock = 500 Mhz = 417 MIPS(10% dependency) CPI 1.2 A dependant instruction will take= 1 + 2 stalls = 3 clocks An independent instruction will take= 1 + 0 stalls = 1 clocks Suppose 10% of the time the instructions are dependant? Averge instruction time = 10%*3 + 90%*1 = 0.10*3 + 0.90*1 = 1.2 clocks MIPS = Clock = 500 Mhz = 167 MIPS(100% dependency) CPI 3 MIPS = Clock = 500 Mhz = 500 MIPS(0% dependency) CPI 1

20 EECS 322 April 10, 2000 Data Dependencies: with forwarding sub $2,$1,$3 and $12,$2,$5 Clock IF 1 1 6 6 WB M 5 5 Detected Data Hazard 1a ID/EX.$rs = EX/M.$rd ID 3 3 EX M 4 4 ID IF 2 2 MIPS = Clock = 500 Mhz = 500 MIPS CPI 1 Suppose every instruction is dependant = 1 + 0 stalls = 1 clock

21 EECS 322 April 10, 2000 Data Dependencies: Hazard Conditions ID/EX.$rs ID/EX.$rt EX/MEM.$rdest = { 1a. 1b. Hazard TypeSourceDestination Data Hazard Condition occurs whenever a data source needs a previous unavailable result due to a data destination. Data Hazard Detection is always comparing a destination with a source. Example sub $2, $1, $3sub $rd, $rs, $rt and $12, $2, $5and $rd, $rs, $rt ID/EX.$rs ID/EX.$rt MEM/WB.$rdest = 2a. 2b. {

22 EECS 322 April 10, 2000 Data Dependencies: Hazard Conditions 1a Data Hazard:EX/MEM.$rd = ID/EX.$rs sub $2, $1, $3sub $rd, $rs, $rt and $12, $2, $5and $rd, $rs, $rt 1b Data Hazard:EX/MEM.$rd = ID/EX.$rt sub $2, $1, $3sub $rd, $rs, $rt and $12, $1, $2and $rd, $rs, $rt 2a Data Hazard:MEM/WB.$rd = ID/EX.$rs sub $2, $1, $3sub $rd, $rs, $rt and $12, $1, $5sub $rd, $rs, $rt or $13, $2, $1and $rd, $rs, $rt 2b Data Hazard:MEM/WB.$rd = ID/EX.$rt sub $2, $1, $3sub $rd, $rs, $rt and $12, $1, $5sub $rd, $rs, $rt or $13, $6, $2and $rd, $rs, $rt

23 EECS 322 April 10, 2000 Data Dependencies: Worst case Data Hazard: sub $2, $1, $3sub $rd, $rs, $rt and $12, $2, $2and $rd, $rs, $rt or $13, $2, $2and $rd, $rs, $rt Data Hazard 1a: EX/MEM.$rd = ID/EX.$rs Data Hazard 1b: EX/MEM.$rd = ID/EX.$rt Data Hazard 2a: MEM/WB.$rd = ID/EX.$rs Data Hazard 2b: MEM/WB.$rd = ID/EX.$rt

24 EECS 322 April 10, 2000 Data Dependencies: Hazard Conditions ID/EX.$rs ID/EX.$rt = EX/MEM.$rdest } 1a. 1b. ID/EX.$rs ID/EX.$rt = MEM/WB.$rdest } 2a. 2b. Hazard Type $rs $rt $rd ID/EXEX/MEM SourceDestination MEM/WB $rd Pipeline Registers

25 EECS 322 April 10, 2000 Figure 6.38

26 EECS 322 April 10, 2000 Data Hazards: Loads Figure 6.44 Backwards in time: Cannot be resolved Forwards in time: Can be resolved At same time: Not a hazard

27 EECS 322 April 10, 2000 Data Hazards: load stalling Figure 6.45 Stall

28 EECS 322 April 10, 2000 Data Hazards: Hazard detection unit (page 490) IF/ID.$rs IF/ID.$rt = ID/EX.$rt  ID/EX.MemRead=1 } SourceDestination Stall Condition Stall Example lw $2, 20($1)lw $rt, addr($rs) and $4, $2, $5and $rd, $rs, $rt No Stall Example: (only need to look at next instruction) lw $2, 20($1)lw $rt, addr($rs) and $4, $1, $5and $rd, $rs, $rt or $8, $2, $6or $rd, $rs, $rt

29 EECS 322 April 10, 2000 No Stall Example: (only need to look at next instruction) lw $2, 20($1)lw $rt, addr($rs) and $4, $1, $5and $rd, $rs, $rt or $8, $2, $6or $rd, $rs, $rt Data Hazards: Hazard detection unit (page 490) Example load: assume half of the instructions are immediately followed by an instruction that uses it. load instruction time: 50%*(1 clock) + 50%*(2 clocks)=1.5 What is the average number of clocks for the load?

30 EECS 322 April 10, 2000 Hazard Detection Unit: when to stall Figure 6.46

31 EECS 322 April 10, 2000 IF/ID.$rs IF/ID.$rt = ID/EX.$rt  ID/EX.MemRead=1 } SourceDestination Stall Condition ID/EX.$rs ID/EX.$rt = EX/MEM.$rd } ID/EX.$rs ID/EX.$rt = MEM/WB.$rd } SourceDestination Forwarding Condition Data Dependency Units

32 EECS 322 April 10, 2000 $rs $rt $rd ID/EX Pipeline Registers Data Dependency Units $rs $rt $rd IF/ID IF/ID.$rs IF/ID.$rt = ID/EX.$rt  ID/EX.MemRead=1 } SourceDestination Stall Condition $rd EX/MEMMEM/WB $rd Forwarding ComparisonsStalling Comparisons

33 EECS 322 April 10, 2000 Branch Hazards: Soln #1, Stall until Decision made (fig. 6.4) Decision made in ID stage: do load @3C:add$4, $5, $6 @40: beq $1, $3, 7 @44: and$12, $2, $5 @48: or$13, $6, $2 @4C: add$14, $2, $2 @50: lw$4, 50($7) Soln #1: Stall until Decision is made Stall

34 EECS 322 April 10, 2000 Branch Hazards: Soln #2, Predict until Decision made beq $1,$3,7 and $12, $2, $5 Clock IF 1 1 6 6 EX 7 7 M 8 8 WB ID IF 2 2 WB 5 5 ID 3 3 EXM 4 4 lw $4, 50($7) EXM WB IFID Predict false branch Decision made in ID stage: discard & branch discard “and $12,$2,$5” instruction

35 EECS 322 April 10, 2000 beq $1,$3,7 add $4,$6,$6 Clock IF 1 1 6 6 EX 7 7 M 8 8 WB ID IF 2 2 WB 5 5 ID 3 3 EXM 4 4 lw $4, 50($7) EXM WB IFID Move instruction before branch Decision made in ID stage: branch Do not need to discard instruction Branch Hazards: Soln #3, Delayed Decision

36 EECS 322 April 10, 2000 Branch Hazards: Soln #3, Delayed Decision beq $1,$3,7 and $12, $2, $5 Clock IF 1 1 6 6 EX 7 7 M 8 8 WB ID IF 2 2 WB 5 5 ID 3 3 EXM 4 4 lw $4, 50($7) EXM WB IFID Decision made in ID stage: do branch

37 EECS 322 April 10, 2000 Branch Hazards: Decision made in the ID stage (figure 6.4) beq $1,$3,7 nop Clock IF 1 1 6 6 EX 7 7 M 8 8 WB ID IF 2 2 WB 5 5 ID 3 3 EXM 4 4 lw $4, 50($7) EXM WB IFID No decision yet: insert a nop Decision: do load

38 EECS 322 April 10, 2000 Branch Hazards: Soln #2, Predict until Decision made Figure 6.50 Branch Decision made in MEM stage: Discard values when wrong prediction Same effect as 3 stalls Predict false branch

39 EECS 322 April 10, 2000 Figure 6.51 Early branch comparison Flush: if wrong prediciton, add nops

40 EECS 322 April 10, 2000 Performance load: assume half of the instructions are immediately followed by an instruction that uses it (i.e. data dependency) load instruction time = 50%*(1 clock) + 50%*(2 clocks)=1.5 Branch: the branch delay of misprediction is 1 clock cycle that 25% of the branches are mispredicted. branch time = 75%*(1 clocks) + 25%*(2 clocks) = 1.25 Jump: assume that jumps always pay 1 full clock cycle delay (stall). Jump instruction time = 2

41 EECS 322 April 10, 2000 InstructionPipeline Cycles Instruction Mix loads arithmetic branches stores 1.5 1 1 1.25 23% 13% 43% 19% Clock speed 500 Mhz 2 ns CPI1.18 MIPS424 MIPS= Clock/CPI jumps22% Single- Cycle 1 1 1 1 125 Mhz 8 ns 1 125 MIPS 1 Multi-Cycle Clocks 5 4 3 4 500 Mhz 2 ns 4.02 125 MIPS 3 =  Cycles*Mix Performance, page 504

42 EECS 322 April 10, 2000 1. Convert the RISCEE 1 Architecture into a pipeline Architecture (like Figure 6.30) (showing the number data and control bits). 2. Build the control line table (like Figure 6.28) for the RISCEE3 pipeline architecture for RISCEE1 instructions: (addi, subi, load, store, beq, jmp, jal). 3. Homework 6.23, p534 4. Homework 6.24, p534 Homework for Wednesday April 5, 2000


Download ppt "EECS 322 April 10, 2000 EECS 322 Computer Architecture Pipeline Control, Data Hazards and Branch Hazards."

Similar presentations


Ads by Google