Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lec 8: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University.

Similar presentations


Presentation on theme: "Lec 8: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University."— Presentation transcript:

1 Lec 8: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

2 © Kavita Bala, Computer Science, Cornell University Basic Pipelining Five stage “RISC” load-store architecture 1.Instruction fetch (IF) get instruction from memory 2.Instruction Decode (ID) translate opcode into control signals and read regs 3.Execute (EX) perform ALU operation 4.Memory (MEM) Access memory if load/store 5.Writeback (WB) update register file Following slides thanks to Sally McKee

3 © Kavita Bala, Computer Science, Cornell University Pipelined Implementation Break the execution of the instruction into cycles (five, in this case) Design a separate stage for the execution performed during each cycle Build pipeline registers (latches) to communicate between the stages Slides thanks to Sally McKee

4 © Kavita Bala, Computer Science, Cornell University Stage 1: Fetch and Decode Design a datapath that can fetch an instruction from memory every cycle –Use PC to index memory to read instruction –Increment the PC (assume no branches for now) Write everything needed to complete execution to the pipeline register (IF/ID) –The next stage will read this pipeline register –Note that pipeline register must be edge triggered Slides thanks to Sally McKee

5 © Kavita Bala, Computer Science, Cornell University Instruction bits IF / ID Pipeline register PC Instruction Memory/ Cache en 1 incr MUXMUX Rest of pipelined datapath PC + 1 Slides thanks to Sally McKee

6 © Kavita Bala, Computer Science, Cornell University Stage 2: Decode Reads the IF/ID pipeline register, decodes instruction, and reads register file (specified by regA and regB of instruction bits) –Decode can be easy, just pass on the opcode and let later stages figure out their own control signals for the instruction Write everything needed to complete execution to the pipeline register (ID/EX) –Pass on the offset field and destination register specifiers (or simply pass on the whole instruction!) –Pass on PC+1 even though decode didn’t use it

7 © Kavita Bala, Computer Science, Cornell University Destreg Data ID / EX Pipeline register Contents Of regA Contents Of regB Register File regA regB en Rest of pipelined datapath Instruction bits IF / ID Pipeline register PC + 1 Control Signals Stage 1: Inst Fetch datapath Slides thanks to Sally McKee

8 © Kavita Bala, Computer Science, Cornell University Stage 3: Execute Design a datapath that performs the proper ALU operation for the instruction specified and values present in the ID/EX pipeline register –The inputs are the contents of regA and either the contents of regB or the offset field in the instruction –Also, calculate PC+1+offset, in case this is a branch Write everything needed to complete execution to the pipeline register (EX/Mem) –ALU result, contents of regB and PC+1+offset –Instruction bits for opcode and destReg specifiers

9 © Kavita Bala, Computer Science, Cornell University ID / EX Pipeline register Contents Of regA Contents Of regB Rest of pipelined datapath Alu Result EX/Mem Pipeline register PC + 1 Control Signals Stage 2: Decode datapath Control Signals contents of regB ALUALU MUXMUX PC + 1 + offset Magic Slides thanks to Sally McKee

10 © Kavita Bala, Computer Science, Cornell University Stage 4: Memory Operation Design a datapath that performs the proper memory operation for the instruction specified and values present in the EX/Mem pipeline register –ALU result contains address for ld and st instructions –Opcode bits control memory R/W and enable signals Write everything needed to complete execution to the pipeline register (Mem/WB) –ALU result and MemData –Instruction bits for opcode and destReg specifiers

11 © Kavita Bala, Computer Science, Cornell University Alu Result Mem/WB Pipeline register Rest of pipelined datapath Alu Result EX/Mem Pipeline register Stage 3: Execute datapath Control Signals PC+1 +offset contents of regB This goes back to the MUX before the PC in stage 1 Memory Read Data Data Memory en R/W Control Signals MUX control for PC input Slides thanks to Sally McKee

12 © Kavita Bala, Computer Science, Cornell University Stage 5: Write Back Design a datapath that completes the execution of this instruction, writing to the register file if required –Write MemData to destReg for ld instruction –Write ALU result to destReg for arithmetic/logic instructions –Opcode bits also control register write enable signal Slides thanks to Sally McKee

13 © Kavita Bala, Computer Science, Cornell University Alu Result Mem/WB Pipeline register Stage 4: Memory datapath Control Signals Memory Read Data MUXMUX This goes back to data input of register file This goes back to the destination register specifier MUXMUX bits 0-2 bits 15-17 register write enable Slides thanks to Sally McKee

14 © Kavita Bala, Computer Science, Cornell University Sample Code (Simple) Assume eight-register machine Run the following code on a pipelined datapath add 3 1 2 ; reg 3 = reg 1 + reg 2 nand 6 4 5 ; reg 6 = ~(reg 4 & reg 5) lw 4 20 (2) ; reg 4 = Mem[reg2+20] add 5 2 5 ; reg 5 = reg 2 + reg 5 sw 7 12(3) ; Mem[reg3+12] = reg 7 Slides thanks to Sally McKee

15 © Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX Bits 0-2 Bits 15-17 op dest offset valB valA PC+1 target ALU result op dest valB op dest ALU result mdata instruction 0 R2 R3 R4 R5 R1 R6 R0 R7 regA regB Bits 21-23 data dest Slides thanks to Sally McKee

16 © Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX Bits 0-2 Bits 15-17 nop 0 0 0 0 00 0 0 0 0 0 0 0 0 9 12 18 7 36 41 0 22 R2 R3 R4 R5 R1 R6 R0 R7 Bits 21-23 data dest Initial State Slides thanks to Sally McKee

17 © Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX Bits 0-2 Bits 15-17 nop 0 0 0 0 01 0 0 0 0 0 0 0 0 add 3 1 2 9 12 18 7 36 41 0 22 R2 R3 R4 R5 R1 R6 R0 R7 Bits 21-23 data dest Fetch: add 3 1 2 Time: 1 Slides thanks to Sally McKee

18 © Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX Bits 0-2 Bits 15-17 add 3 3 9 36 12 0 0 nop 0 0 0 0 0 0 nand 6 4 5 9 12 18 7 36 41 0 22 R2 R3 R4 R5 R1 R6 R0 R7 1 2 Bits 21-23 data dest Fetch: nand 6 4 5 nand 6 4 5 add 3 1 2 Time: 2 Slides thanks to Sally McKee

19 © Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX Bits 0-2 Bits 15-17 nand 6 6 7 18 23 4 45 add 3 9 nop 0 0 0 0 lw 4 20(2) 9 12 18 7 36 41 0 22 R2 R3 R4 R5 R1 R6 R0 R7 4 5 Bits 21-23 data dest Fetch: lw 4 20(2) lw 4 20(2) nand 6 4 5 add 3 1 2 Time: 3 36 9 3 Slides thanks to Sally McKee

20 © Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX Bits 0-2 Bits 15-17 lw 4 20 18 9 34 8 -3 nand 6 7 add 3 45 0 0 add 5 2 5 9 12 18 7 36 41 0 22 R2 R3 R4 R5 R1 R6 R0 R7 2 4 Bits 21-23 data dest Fetch: add 5 2 5 add 5 2 5 lw 4 20(2) nand 6 4 5 add 3 1 2 Time: 4 18 7 6 45 3 Slides thanks to Sally McKee

21 © Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX Bits 0-2 Bits 15-17 add 5 5 7 9 45 23 29 lw 4 18 nand 6 -3 0 0 sw 7 12(3) 9 45 18 7 36 41 0 22 R2 R3 R4 R5 R1 R6 R0 R7 2 5 Bits 21-23 data dest Fetch: sw 7 12(3) sw 7 12(3) add 5 2 5 lw 4 20 (2) nand 4 5 6 add Time: 5 9 20 4 -3 6 45 3 Slides thanks to Sally McKee

22 © Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX Bits 0-2 Bits 15-17 sw 7 12 22 45 5 9 16 add 5 7 lw 4 29 99 0 9 45 18 7 36 -3 0 22 R2 R3 R4 R5 R1 R6 R0 R7 3 7 Bits 21-23 data dest No more instructions sw 7 12(3) add 5 2 5 lw 4 20(2) nand Time: 6 9 7 5 29 4 -3 6 Slides thanks to Sally McKee

23 © Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX Bits 0-2 Bits 15-17 15 57 sw 7 22 add 5 16 0 0 9 45 99 7 36 -3 0 22 R2 R3 R4 R5 R1 R6 R0 R7 Bits 21-23 data dest No more instructions sw 7 12(3) add 5 2 5 lw Time: 7 45 7 12 16 5 99 4 Slides thanks to Sally McKee

24 © Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX Bits 0-2 Bits 15-17 sw 7 57 0 9 45 99 16 36 -3 0 22 R2 R3 R4 R5 R1 R6 R0 R7 Bits 21-23 data dest No more instructions sw 7 12(3) add Time: 8 22 57 22 16 5 Slides thanks to Sally McKee

25 © Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX Bits 0-2 Bits 15-17 9 45 99 16 36 -3 0 22 R2 R3 R4 R5 R1 R6 R0 R7 Bits 21-23 data dest No more instructions sw Time: 9 Slides thanks to Sally McKee

26 © Kavita Bala, Computer Science, Cornell University Time Graphs Time: 1 2 3 4 5 6 7 8 9 add nand lw add sw fetch decode execute memory writeback Slides thanks to Sally McKee

27 © Kavita Bala, Computer Science, Cornell University Pipelining Recap Powerful technique for masking latencies –Logically, instructions execute one at a time –Physically, instructions execute in parallel  Instruction level parallelism Decouples the processor model from the implementation –Interface vs. implementation BUT dependencies between instructions complicate the implementation


Download ppt "Lec 8: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University."

Similar presentations


Ads by Google