Lec 8: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University.

Slides:

Advertisements

Similar presentations

CSE502: Computer Architecture Core Pipelining. CSE502: Computer Architecture Before there was pipelining… Single-cycle control: hardwired – Low CPI (1)

Advertisements

MIPS Pipelined Datapath

1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6.

Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University RISC Pipeline See: P&H Chapter 4.6.

© Kavita Bala, Computer Science, Cornell University Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipelining See: P&H Chapter 4.5.

CS-447– Computer Architecture Lecture 12 Multiple Cycle Datapath

EECS 470 Pipeline Hazards Lecture 4 Coverage: Appendix A.

Computer Architecture Lecture 3 Coverage: Appendix A

©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan

Computer ArchitectureFall 2007 © October 3rd, 2007 Majd F. Sakr CS-447– Computer Architecture.

The Processor 2 Andreas Klappenecker CPSC321 Computer Architecture.

Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University.

CSE378 Pipelining1 Pipelining Basic concept of assembly line –Split a job A into n sequential subjobs (A 1,A 2,…,A n ) with each A i taking approximately.

EECS 470 Further review: Pipeline Hazards and More Lecture 2 – Winter 2014 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti,

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University See P&H Chapter: 1.6,

EECS 470 Lecture 1 Computer Architecture Fall 2015 Slides developed in part by Profs. Brehob, Austin, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson,

Lecture 8: Processors, Introduction EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014,

Pipelined Datapath and Control

MIPS Pipeline Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H Chapter 4.6.

CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.

Sample Code (Simple) Run the following code on a pipelined datapath: add1 2 3 ; reg 3 = reg 1 + reg 2 nand ; reg 6 = reg 4 & reg 5 lw ; reg.

Electrical and Computer Engineering University of Cyprus LAB 2: MIPS.

CDA 5155 Computer Architecture Week 1.5. Start with the materials: Conductors and Insulators Conductor: a material that permits electrical current to.

Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University MIPS Pipeline See P&H Chapter 4.6.

Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University MIPS Pipeline See P&H Chapter 4.6.

CSIE30300 Computer Architecture Unit 05: Overcoming Data Hazards Hsin-Chou Chi [Adapted from material by and

CMPE 421 REVIEW: MIDTERM 1. A MODIFIED FIVE-Stage Pipeline PC A Y R MD1 addr inst Inst Memory Imm Ext add rd1 GPRs rs1 rs2 ws wd rd2 we wdata addr wdata.

CDA 5155 Week 3 Branch Prediction Superscalar Execution.

Simulator Outline of MIPS Simulator project  Write a simulator for the MIPS five-stage pipeline that does the following: Implements a subset of.

Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.

Interstage Buffers 1 Computer Organization II © McQuain Pipeline Timing Issues Consider executing: add $t2, $t1, $t0 sub $t3, $t1, $t0 or.

CS161 – Design and Architecture of Computer Systems

Electrical and Computer Engineering University of Cyprus

CS161 – Design and Architecture of Computer Systems

Pipeline Timing Issues

Computer Organization

Variable Word Width Computation for Low Power

Morgan Kaufmann Publishers The Processor

Single Clock Datapath With Control

Decode and Operand Read

ECS 154B Computer Architecture II Spring 2009

Design of the Control Unit for Single-Cycle Instruction Execution

Morgan Kaufmann Publishers The Processor

Chapter 4 The Processor Part 2

Single-cycle datapath, slightly rearranged

Single-Cycle CPU DataPath.

Current Design.

Design of the Control Unit for One-cycle Instruction Execution

The Multicycle Implementation

Pipelining in more detail

Systems Architecture II

Morgan Kaufmann Publishers The Processor

Pipelining Basic concept of assembly line

The Multicycle Implementation

The Processor Lecture 3.6: Control Hazards

The Processor Lecture 3.4: Pipelining Datapath and Control

Control unit extension for data hazards

Instruction Execution Cycle

Pipeline Control unit (highly abstracted)

Pipelining Basic concept of assembly line

Pipelining (II).

Control unit extension for data hazards

Pipelining Basic concept of assembly line

Introduction to Computer Organization and Architecture

Control unit extension for data hazards

Pipelining Hakim Weatherspoon CS 3410 Computer Science

MIPS Pipeline Hakim Weatherspoon CS 3410, Spring 2013 Computer Science

MIPS Pipelined Datapath

Pipelined datapath and control

CS161 – Design and Architecture of Computer Systems

Presentation transcript:

Lec 8: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University Basic Pipelining Five stage “RISC” load-store architecture 1.Instruction fetch (IF) get instruction from memory 2.Instruction Decode (ID) translate opcode into control signals and read regs 3.Execute (EX) perform ALU operation 4.Memory (MEM) Access memory if load/store 5.Writeback (WB) update register file Following slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University Pipelined Implementation Break the execution of the instruction into cycles (five, in this case) Design a separate stage for the execution performed during each cycle Build pipeline registers (latches) to communicate between the stages Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University Stage 1: Fetch and Decode Design a datapath that can fetch an instruction from memory every cycle –Use PC to index memory to read instruction –Increment the PC (assume no branches for now) Write everything needed to complete execution to the pipeline register (IF/ID) –The next stage will read this pipeline register –Note that pipeline register must be edge triggered Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University Instruction bits IF / ID Pipeline register PC Instruction Memory/ Cache en 1 incr MUXMUX Rest of pipelined datapath PC + 1 Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University Stage 2: Decode Reads the IF/ID pipeline register, decodes instruction, and reads register file (specified by regA and regB of instruction bits) –Decode can be easy, just pass on the opcode and let later stages figure out their own control signals for the instruction Write everything needed to complete execution to the pipeline register (ID/EX) –Pass on the offset field and destination register specifiers (or simply pass on the whole instruction!) –Pass on PC+1 even though decode didn’t use it

© Kavita Bala, Computer Science, Cornell University Destreg Data ID / EX Pipeline register Contents Of regA Contents Of regB Register File regA regB en Rest of pipelined datapath Instruction bits IF / ID Pipeline register PC + 1 Control Signals Stage 1: Inst Fetch datapath Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University Stage 3: Execute Design a datapath that performs the proper ALU operation for the instruction specified and values present in the ID/EX pipeline register –The inputs are the contents of regA and either the contents of regB or the offset field in the instruction –Also, calculate PC+1+offset, in case this is a branch Write everything needed to complete execution to the pipeline register (EX/Mem) –ALU result, contents of regB and PC+1+offset –Instruction bits for opcode and destReg specifiers

© Kavita Bala, Computer Science, Cornell University ID / EX Pipeline register Contents Of regA Contents Of regB Rest of pipelined datapath Alu Result EX/Mem Pipeline register PC + 1 Control Signals Stage 2: Decode datapath Control Signals contents of regB ALUALU MUXMUX PC offset Magic Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University Stage 4: Memory Operation Design a datapath that performs the proper memory operation for the instruction specified and values present in the EX/Mem pipeline register –ALU result contains address for ld and st instructions –Opcode bits control memory R/W and enable signals Write everything needed to complete execution to the pipeline register (Mem/WB) –ALU result and MemData –Instruction bits for opcode and destReg specifiers

© Kavita Bala, Computer Science, Cornell University Alu Result Mem/WB Pipeline register Rest of pipelined datapath Alu Result EX/Mem Pipeline register Stage 3: Execute datapath Control Signals PC+1 +offset contents of regB This goes back to the MUX before the PC in stage 1 Memory Read Data Data Memory en R/W Control Signals MUX control for PC input Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University Stage 5: Write Back Design a datapath that completes the execution of this instruction, writing to the register file if required –Write MemData to destReg for ld instruction –Write ALU result to destReg for arithmetic/logic instructions –Opcode bits also control register write enable signal Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University Alu Result Mem/WB Pipeline register Stage 4: Memory datapath Control Signals Memory Read Data MUXMUX This goes back to data input of register file This goes back to the destination register specifier MUXMUX bits 0-2 bits register write enable Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University Sample Code (Simple) Assume eight-register machine Run the following code on a pipelined datapath add ; reg 3 = reg 1 + reg 2 nand ; reg 6 = ~(reg 4 & reg 5) lw 4 20 (2) ; reg 4 = Mem[reg2+20] add ; reg 5 = reg 2 + reg 5 sw 7 12(3) ; Mem[reg3+12] = reg 7 Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX Bits 0-2 Bits op dest offset valB valA PC+1 target ALU result op dest valB op dest ALU result mdata instruction 0 R2 R3 R4 R5 R1 R6 R0 R7 regA regB Bits data dest Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX Bits 0-2 Bits nop R2 R3 R4 R5 R1 R6 R0 R7 Bits data dest Initial State Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX Bits 0-2 Bits nop add R2 R3 R4 R5 R1 R6 R0 R7 Bits data dest Fetch: add Time: 1 Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX Bits 0-2 Bits add nop nand R2 R3 R4 R5 R1 R6 R0 R7 1 2 Bits data dest Fetch: nand nand add Time: 2 Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX Bits 0-2 Bits nand add 3 9 nop lw 4 20(2) R2 R3 R4 R5 R1 R6 R0 R7 4 5 Bits data dest Fetch: lw 4 20(2) lw 4 20(2) nand add Time: Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX Bits 0-2 Bits lw nand 6 7 add add R2 R3 R4 R5 R1 R6 R0 R7 2 4 Bits data dest Fetch: add add lw 4 20(2) nand add Time: Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX Bits 0-2 Bits add lw 4 18 nand sw 7 12(3) R2 R3 R4 R5 R1 R6 R0 R7 2 5 Bits data dest Fetch: sw 7 12(3) sw 7 12(3) add lw 4 20 (2) nand add Time: Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX Bits 0-2 Bits sw add 5 7 lw R2 R3 R4 R5 R1 R6 R0 R7 3 7 Bits data dest No more instructions sw 7 12(3) add lw 4 20(2) nand Time: Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX Bits 0-2 Bits sw 7 22 add R2 R3 R4 R5 R1 R6 R0 R7 Bits data dest No more instructions sw 7 12(3) add lw Time: Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX Bits 0-2 Bits sw R2 R3 R4 R5 R1 R6 R0 R7 Bits data dest No more instructions sw 7 12(3) add Time: Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX Bits 0-2 Bits R2 R3 R4 R5 R1 R6 R0 R7 Bits data dest No more instructions sw Time: 9 Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University Time Graphs Time: add nand lw add sw fetch decode execute memory writeback Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University Pipelining Recap Powerful technique for masking latencies –Logically, instructions execute one at a time –Physically, instructions execute in parallel  Instruction level parallelism Decouples the processor model from the implementation –Interface vs. implementation BUT dependencies between instructions complicate the implementation