Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University.

Similar presentations


Presentation on theme: "Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University."— Presentation transcript:

1 Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

2 © Kavita Bala, Computer Science, Cornell University Basic Pipelining Five stage “RISC” load-store architecture 1.Instruction fetch (IF) get instruction from memory 2.Instruction Decode (ID) translate opcode into control signals and read regs 3.Execute (EX) perform ALU operation 4.Memory (MEM) Access memory if load/store 5.Writeback (WB) update register file Following slides thanks to Sally McKee

3 © Kavita Bala, Computer Science, Cornell University Sample Code (Simple) Assume eight-register machine Run the following code on a pipelined datapath add 3 1 2 ; reg 3 = reg 1 + reg 2 nand 6 4 5 ; reg 6 = ~(reg 4 & reg 5) lw 4 20 (2) ; reg 4 = Mem[reg2+20] add 5 2 5 ; reg 5 = reg 2 + reg 5 sw 7 12(3) ; Mem[reg3+12] = reg 7 Slides thanks to Sally McKee

4 © Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX Bits 0-2 Bits 15-17 op dest offset valB valA PC+1 target ALU result op dest valB op dest ALU result mdata instruction 0 R2 R3 R4 R5 R1 R6 R0 R7 regA regB Bits 21-23 data dest Slides thanks to Sally McKee

5 © Kavita Bala, Computer Science, Cornell University Time Graphs Time: 1 2 3 4 5 6 7 8 9 add nand lw add sw fetch decode execute memory writeback Slides thanks to Sally McKee

6 © Kavita Bala, Computer Science, Cornell University Pipelining Recap Powerful technique for masking latencies –Logically, instructions execute one at a time –Physically, instructions execute in parallel  Instruction level parallelism Decouples the processor model from the implementation –Interface vs. implementation BUT dependencies between instructions complicate the implementation

7 © Kavita Bala, Computer Science, Cornell University What Can Go Wrong? Data hazards –register reads occur in stage 2 –register writes occur in stage 5 –could read the wrong value if is about to be written Control hazards –branch instruction may change the PC in stage 4 –what do we fetch before that?

8 © Kavita Bala, Computer Science, Cornell University Handling Data Hazards Detect and Stall –If hazards exist, stall the processor until they go away –Safe, but not great for performance Detect and Forward –If hazards exist, fix up the pipeline to get the correct value (if possible) –Most common solution for high performance

9 © Kavita Bala, Computer Science, Cornell University Handling Data Hazards III: Detect –dest of early instrs == source of current Forward: –New bypass datapaths route computed data to where it is needed –New MUX and control to pick the right data Beware: Stalling may still be required even in the presence of forwarding

10 © Kavita Bala, Computer Science, Cornell University Different type of Hazards Use register value in subsequent instruction add3 1 2 or 6 3 1 nand 5 3 4 sub 6 3 1 Slides thanks to Sally McKee

11 © Kavita Bala, Computer Science, Cornell University Data Hazards: AL ops add3 1 2 nand 5 3 4 time fetch decode execute memory writeback add nand If not careful, you read the wrong value of R3

12 © Kavita Bala, Computer Science, Cornell University Data Hazards: AL ops add3 1 2 nand 5 3 4 time fetch decode execute memory writeback add nand If not careful, you read the wrong value of R3

13 © Kavita Bala, Computer Science, Cornell University Data Hazards: AL ops add3 1 2 nand 5 3 4 time fetch decode execute memory writeback add nand If not careful, you read the wrong value of R3

14 © Kavita Bala, Computer Science, Cornell University Handling Data Hazards: Forwarding No point forwarding to decode Forward to EX stage From output of ALU and MEM stages

15 © Kavita Bala, Computer Science, Cornell University Data Hazards: AL ops add3 1 2 nand 5 3 4 time fetch decode execute memory writeback add nand

16 © Kavita Bala, Computer Science, Cornell University Data Hazards: AL ops add3 1 2 and 8 0 2 nand 5 3 4 time fetch decode execute memory writeback add fetch decode execute memory writeback nand

17 © Kavita Bala, Computer Science, Cornell University Forwarding Illustration I n s t r. O r d e r add $3 add nand $5 $3 ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg

18 © Kavita Bala, Computer Science, Cornell University Data Hazards: AL ops add3 1 2 and 8 0 2 and 8 2 0 nand 5 3 4 time fetch decode execute memory writeback add fetch decode execute memory writeback nand fetch decode execute memory writeback add Write in first half of cycle, read in second half of cycle

19 © Kavita Bala, Computer Science, Cornell University Data Hazards: lw lw3 12(zero) and 8 0 2 nand 5 3 4 time fetch decode execute memory writeback lw add fetch decode execute memory writeback nand

20 © Kavita Bala, Computer Science, Cornell University Data Hazards: lw lw3 12(zero) and 8 0 2 nand 5 3 4 time fetch decode execute memory writeback lw add fetch decode execute memory writeback nand

21 © Kavita Bala, Computer Science, Cornell University Data Hazards: lw lw3 12(zero) and 8 0 2 nand 5 3 4 time fetch decode execute memory writeback lw fetch decode execute memory writeback nand

22 © Kavita Bala, Computer Science, Cornell University A tricky case add 1 1 2 add 1 1 3 add 1 1 4 I n s t r. O r d e r add 1 1 2 ALU IM Reg DMReg add 1 1 3 add 1 1 4 ALU IM Reg DMReg ALU IM Reg DMReg

23 © Kavita Bala, Computer Science, Cornell University Another tricky case add0 4 1 add 3 0 4

24 © Kavita Bala, Computer Science, Cornell University Handling Data Hazards: Summary Forward: –New bypass datapaths route computed data to where it is needed Beware: Stalling may still be required even in the presence of forwarding For lw –MIPS 2000/3000: a delay slot –MIPS 4000 onwards: stall  But really rely on compiler to treat it like delay slot

25 © Kavita Bala, Computer Science, Cornell University Detection Detection: –Compare regA with previous DestRegs –Compare regB with previous DestRegs –regA and regB going to ALU –Previous destregs  Previously in ALU, Memory and Writeback

26 © Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX op dest offset valB valA PC+1 target ALU result op dest valB op dest ALU result mdata instruction 0 R2 R3 R4 R5 R1 R6 R0 R7 regA regB data dest Slides thanks to Sally McKee

27 © Kavita Bala, Computer Science, Cornell University Sample Code Which data hazards do you see? add 3 1 2 nand 5 3 4 add 7 6 3 lw 6 24(3) sw 6 12(2) Slides thanks to Sally McKee

28 © Kavita Bala, Computer Science, Cornell University Sample Code Which data hazards do you see? add 3 1 2 nand 5 3 4 add 7 6 3 lw 6 24(3) sw 6 12(2) Slides thanks to Sally McKee

29 © Kavita Bala, Computer Science, Cornell University Hazard PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX add 7 14 12 nand 5 3 4 7 10 11 77 14 1 0 8 R2 R3 R4 R5 R1 R6 R0 R7 regA regB data 3 fwd 3 First half of cycle 3 Slides thanks to Sally McKee

30 © Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX nand 11 10 23 21 add add 7 6 3 7 10 11 77 14 1 0 8 R2 R3 R4 R5 R1 R6 R0 R7 regA regB 5 data H1 3 End of cycle 3 Slides thanks to Sally McKee

31 © Kavita Bala, Computer Science, Cornell University New Hazard PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX nand 11 10 23 21 add add 7 6 3 7 10 11 77 14 1 0 8 R2 R3 R4 R5 R1 R6 R0 R7 regA regB 5 data 3 MUXMUX H1 3 First half of cycle 4 21 11 Slides thanks to Sally McKee

32 © Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX add 10 1 34 -2 nand add 21 lw 6 24(3) 7 10 11 77 14 1 0 8 R2 R3 R4 R5 R1 R6 R0 R7 regA regB 753 data MUXMUX H2H1 End of cycle 4 Slides thanks to Sally McKee

33 © Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX add 10 1 34 -2 nand add 21 lw 6 24(3) 7 10 11 77 14 1 0 8 R2 R3 R4 R5 R1 R6 R0 R7 regA regB 753 data MUXMUX H2H1 First half of cycle 5 3 21 1 Slides thanks to Sally McKee

34 © Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX lw 24 21 4 5 22 add nand -2 sw 6 12(2) 7 21 11 77 14 1 0 8 R2 R3 R4 R5 R1 R6 R0 R7 regA regB 75 data MUXMUX H2H1 6 End of cycle 5 Slides thanks to Sally McKee

35 © Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX lw 24 21 4 5 22 add nand -2 sw 6 12(2) 7 21 11 77 14 1 0 8 R2 R3 R4 R5 R1 R6 R0 R7 regA regB 675 data MUXMUX H2H1 First half of cycle 6 Hazard 6 en L Slides thanks to Sally McKee

36 © Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX 5 31 lw add 22 sw 6 12(2) 7 21 11 -2 14 1 0 8 R2 R3 R4 R5 R1 R6 R0 R7 regA regB 67 data MUXMUX H2 End of cycle 6 nop Slides thanks to Sally McKee

37 © Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX nop 5 31 lw add 22 sw 6 12(2) 7 21 11 -2 14 1 0 8 R2 R3 R4 R5 R1 R6 R0 R7 regA regB 67 data MUXMUX H2 First half of cycle 7 Hazard 6 Slides thanks to Sally McKee

38 © Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX sw 12 7 1 5 nop lw 99 7 21 11 -2 14 1 0 22 R2 R3 R4 R5 R1 R6 R0 R7 regA regB 6 data MUXMUX H3 End of cycle 7 Slides thanks to Sally McKee 6

39 © Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX sw 12 7 1 5 nop lw 99 7 21 11 -2 14 0 8 R2 R3 R4 R5 R1 R6 R0 R7 regA regB data MUXMUX H3 First half of cycle 8 99 12 Slides thanks to Sally McKee 6 99

40 © Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX 12 7 1 5 111 sw nop 7 21 11 -2 14 0 8 R2 R3 R4 R5 R1 R6 R0 R7 regA regB data MUXMUX H3 End of cycle 8 Slides thanks to Sally McKee 6 99

41 © Kavita Bala, Computer Science, Cornell University Control Hazards beqz3 goToZero nand 5 1 4 add 6 7 8 time fetch decode execute memory writeback beqz fetch decode execute memory writeback nand

42 © Kavita Bala, Computer Science, Cornell University Control Hazards for 4-stage pipeline beqz3 goToZero nand 5 1 4 add 6 7 8 time F/D execute memory writeback beqz F/D execute memory writeback nand

43 © Kavita Bala, Computer Science, Cornell University Control Hazards Stall –Inject NOPs into the pipeline when the next instruction is not known –Pros: simple, clean; Cons: slow Delay Slots –Tell the programmer that the N instructions after a jump will always be executed, no matter what the outcome of the branch –Pros: The compiler may be able to fill the slots with useful instructions; Cons: breaks abstraction boundary Speculative Execution –Insert instructions into the pipeline –Replace instructions with NOPs if the branch comes out opposite of what the processor expected –Pros: Clean model, fast; Cons: complex

44 © Kavita Bala, Computer Science, Cornell University Control Hazards for 4-stage pipeline beqz3 goToZero nand 5 1 4 add 6 7 8 time F/D execute memory writeback beqz F/D execute memory writeback nand goToZero: lui 2 1 F/D execute memory writeback lui Delay slot!

45 © Kavita Bala, Computer Science, Cornell University Hazard summary Data hazards –Forward –Stall for lw/sw Control hazard –Delay slot

46 © Kavita Bala, Computer Science, Cornell University Pipelining for Other Circuits Pipelining can be applied to any combinational logic circuit What’s the circuit latency at 2ns. per gate? What’s the throughput? What is the stage breakdown? Where would you place latches? What’s the throughput after pipelining?


Download ppt "Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University."

Similar presentations


Ads by Google