Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pipelining - II Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002.

Similar presentations


Presentation on theme: "Pipelining - II Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002."— Presentation transcript:

1 Pipelining - II Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002

2 Revisiting Pipelining Lessons Pipelining doesn’t help latency of single task, it helps throughput of entire workload Pipeline rate limited by slowest pipeline stage Multiple tasks operating simultaneously using different resources Potential speedup = Number pipe stages Unbalanced lengths of pipe stages reduces speedup Time to “fill” pipeline and time to “drain” it reduces speedup Stall for Dependences ABCD 6 PM 789 TaskOrderTaskOrder Time 3040 20

3 Structural Hazards –Hardware design Control Hazard –Decision based on results Data Hazard –Data Dependency Revisiting Pipelining Hazards

4 Control Signals for existing Datapath The Right to Left Control can lead to hazards

5 Place registers between each step

6 Example 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15

7 Start: Fetch 10 Exec Reg. File Mem Acces s Data Mem ABS Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M rsrt im 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 IF PC Next PC 10 = nnnn

8 Fetch 14, Decode 10 Exec Reg. File Mem Acces s Data Mem ABS Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M 2rt im 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1, r2(35) ID IF PC Next PC 14 = nnn

9 Fetch 20, Decode 14, Exec 10 Exec Reg. File Mem Acces s Data Mem r2 BS Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M 2rt 35 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1 addI r2, r2, 3 EX PC Next PC 20 = n n

10 Fetch 24, Decode 20, Exec 14, Mem 10 Exec Reg. File Mem Acces s Data Mem r2 B r2+35 Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M 45 3 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1 sub r3, r4, r5 addI r2, r2, 3 ID IF EX M PC Next PC 24 = n

11 Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10 Exec Reg. File Mem Acces s Data Mem r4 r5 r2+3 Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M[r2+35] 67 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1 beq r6, r7 100 addI r2 sub r3 ID IF EX M WB PC Next PC 30 =

12 Fetch 100, Dcd 30, Ex 24, Mem 20, WB 14 Exec Reg. File Mem Acces s Data Mem r6 r7 r2+3 Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl r1=M[r2+35] 9xx 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 beq addI r2 sub r3 r4-r5 100 ori r8, r9 17 ID IF EX M WB PC Next PC 100 =

13 Pipelining Load Instruction The five independent functional units in the pipeline datapath are: –Instruction Memory for the Ifetch stage –Register File’s Read ports (bus A and busB) for the Reg/Dec stage –ALU for the Exec stage –Data Memory for the Mem stage –Register File’s Write port (bus W) for the Wr stage Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7 IfetchReg/DecExecMemWr1st lw IfetchReg/DecExecMemWr2nd lw IfetchReg/DecExecMemWr3rd lw

14 Pipelining the R Instruction Ifetch: Instruction Fetch –Fetch the instruction from the Instruction Memory Reg/Dec: Registers Fetch and Instruction Decode Exec: –ALU operates on the two register operands –Update PC Wr: Write the ALU output back to the register file Cycle 1Cycle 2Cycle 3Cycle 4 IfetchReg/DecExecWrR-type

15 Pipelining Both L and R type We have pipeline conflict or structural hazard: –Two instructions try to write to the register file at the same time! –Only one write port Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecExecWrR-type IfetchReg/DecExecWrR-type IfetchReg/DecExecMemWrLoad IfetchReg/DecExecWrR-type IfetchReg/DecExecWrR-type Ops! We have a problem!

16 Important Observations Each functional unit can only be used once per instruction Each functional unit must be used at the same stage for all instructions: –Load uses Register File’s Write Port during its 5th stage –R-type uses Register File’s Write Port during its 4th stage IfetchReg/DecExecMemWrLoad 12345 IfetchReg/DecExecWrR-type 1234

17 Solution Delay R-type’s register write by one cycle: –Now R-type instructions also use Reg File’s write port at Stage 5 –Mem stage is a NOOP stage: nothing is being done. Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecMemWr IfetchReg/DecMemWrR-type IfetchReg/DecExecMemWrLoad IfetchReg/DecMemWrR-type IfetchReg/DecMemWrR-type IfetchReg/Dec Exec WrR-type Mem Exec 123 4 5

18 Datapath (Without Pipeline) IR <- Mem[PC]; PC <– PC+4; A <- R[rs]; B<– R[rt] S <– A + B; R[rd] <– S; S <– A + SX; M <– Mem[S] R[rd] <– M; S <– A or ZX; R[rt] <– S; S <– A + SX; Mem[S] <- B If Cond PC < PC+SX; Exec Reg. File Mem Acces s Data Mem ABS Reg File Equal PC Next PC IR Inst. Mem DM

19 Datapath (With Pipeline) IR <- Mem[PC]; PC <– PC+4; A <- R[rs]; B<– R[rt] S <– A + B; R[rd] <– M; S <– A + SX; M <– Mem[S] R[rd] <– M; S <– A or ZX; R[rt] <– M; S <– A + SX; Mem[S] <- B if Cond PC < PC+SX; M <– S Exec Reg. File Mem Acces s Data Mem AB S Reg File Equal PC Next PC IR Inst. Mem DM M <– S

20 Mem Structural Hazard and Solution I n s t r. O r d e r Time (clock cycles) Load Instr 1 Instr 2 Instr 3 Instr 4 ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Reg MemReg ALU Mem Reg MemReg

21 Control Hazard - #1 Stall Stall: wait until decision is clear Impact: 2 lost cycles (i.e. 3 clock cycles per branch instruction) => slow I n s t r. O r d e r Time (clock cycles) Add Beq Load ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Reg MemReg Mem Lost potential

22 Control Hazard – #2 Predict Predict: guess one direction then back up if wrong Impact: 0 lost cycles per branch instruction if right, 1 if wrong (right ­ 50% of time) More dynamic scheme: history of 1 branch I n s t r. O r d e r Time (clock cycles) Add Beq Load ALU Mem Reg MemReg ALU Mem Reg MemReg Mem ALU Reg MemReg

23 Control Hazard - #3 Delayed Branch Delayed Branch: Redefine branch behavior (takes place after next instruction) Impact: 0 clock cycles per branch instruction if can find instruction to put in “slot” (­ 50% of time) I n s t r. O r d e r Time (clock cycles) Add Beq Misc ALU Mem Reg MemReg ALU Mem Reg MemReg Mem ALU Reg MemReg Load Mem ALU Reg MemReg

24 Data Hazards (RAW) Dependencies backwards in time are hazards I n s t r. O r d e r Time (clock cycles) add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 IFIF ID/R F EXEX ME M WBWB ALU Im Reg Dm Reg ALU Im Reg DmReg ALU Im Reg DmReg Im ALU Reg DmReg

25 Data Hazards [contd…] “Forward” result from one stage to another I n s t r. O r d e r Time (clock cycles) add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 IFIF ID/R F EXEX ME M WBWB ALU Im Reg Dm Reg ALU Im Reg DmReg ALU Im Reg DmReg Im ALU Reg DmReg ALU Im Reg DmReg

26 Data Hazards [contd…] Reg Dependencies backwards in time are hazards Can’t solve with forwarding: Must delay/stall instruction dependent on loads Time (clock cycles) lw r1,0(r2) sub r4,r1,r3 IFIF ID/R F EXEX ME M WBWB ALU Im Reg Dm ALU Im Reg DmReg Stall

27 Hazard Detection I-Fetch DCD MemOpFetch OpFetch Exec Store IFetch DCD ° ° ° Structural Hazard I-Fetch DCD OpFetch Jump IFetch DCD ° ° ° Control Hazard IF DCD EX Mem WB IF DCD OF Ex Mem RAW (read after write) Data Hazard WAW Data Hazard (write after write) IF DCD OF Ex RSWAR Data Hazard (write after read) IF DCD EX Mem WB

28 CPSC614 Lec 2.28 Read After Write (RAW) Instr J tries to read operand before Instr I writes it Caused by a “Data Dependence” (in compiler nomenclature). This hazard results from an actual need for communication. Three Generic Data Hazards I: add r1,r2,r3 J: sub r4,r1,r3

29 CPSC614 Lec 2.29 Write After Read (WAR) Instr J writes operand before Instr I reads it Called an “anti-dependence” by compiler writers. This results from reuse of the name “r1”. Can’t happen in MIPS 5 stage pipeline because: – All instructions take 5 stages, and – Reads are always in stage 2, and – Writes are always in stage 5 I: sub r4,r1,r3 J: add r1,r2,r3 K: mul r6,r1,r7 Three Generic Data Hazards

30 CPSC614 Lec 2.30 Three Generic Data Hazards Write After Write (WAW) Instr J writes operand before Instr I writes it. Called an “output dependence” by compiler writers This also results from the reuse of name “r1”. Can’t happen in MIPS 5 stage pipeline because: – All instructions take 5 stages, and – Writes are always in stage 5 Will see WAR and WAW in later more complicated pipes I: sub r1,r4,r3 J: add r1,r2,r3 K: mul r6,r1,r7

31 Hazard Detection Suppose instruction i is about to be issued and a predecessor instruction j is in the instruction pipeline. A RAW hazard exists on register  if  Rregs( i )  Wregs( j ) A WAW hazard exists on register  if  Wregs( i )  Wregs( j ) A WAR hazard exists on register  if  Wregs( i )  Rregs( j ) Window on execution: Only pending instructions can cause hazards Inst J Inst I New Inst Instruction Movement:

32 Computing CPI Start with Base CPI Add stalls Suppose: –CPI base =1 –Freq branch =20%, freq load =30% –Suppose branches always cause 1 cycle stall –Loads cause a 2 cycle stall Then: CPI = 1 + (1  0.20)+(2  0.30)= 1.8

33 Summary Control Signals need to be propagated Insert Registers between every stage to “remember” and “propagate” values Solutions to Control Hazard are Stall, Predict and Delayed Branch Solutions to Data Hazard is “Forwarding” Effective CPI = CPI ideal + CPI stall


Download ppt "Pipelining - II Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002."

Similar presentations


Ads by Google