CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor.

Slides:



Advertisements
Similar presentations
Pipeline Computer Architecture Lecture 12: Designing a Pipeline Processor.
Advertisements

CS152 Lec9.1 CS152 Computer Architecture and Engineering Lecture 9 Designing Single Cycle Control.
361 multicontroller.1 ECE 361 Computer Architecture Lecture 11: Designing a Multiple Cycle Controller.
361 datapath Computer Architecture Lecture 8: Designing a Single Cycle Datapath.
CS61C L26 Single Cycle CPU Datapath II (1) Garcia © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine.
CS61C L26 CPU Design : Designing a Single-Cycle CPU II (1) Garcia, Spring 2007 © UCB 3.6 TB DVDs? Maybe!  Researchers at Harvard have found a way to use.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
361 multipath..1 ECE 361 Computer Architecture Lecture 10: Designing a Multiple Cycle Processor.
Savio Chau Single Cycle Controller Design Last Time: Discussed the Designing of a Single Cycle Datapath Control Datapath Memory Processor (CPU) Input Output.
Processor II CPSC 321 Andreas Klappenecker. Midterm 1 Tuesday, October 5 Thursday, October 7 Advantage: less material Disadvantage: less preparation time.
ECE 361 Computer Architecture Lecture 13: Designing a Pipeline Processor Start X:40.
Ceg3420 control.1 ©UCB, DAP’ 97 CEG3420 Computer Design Lecture 9.2: Designing Single Cycle Control.
ECE 232 L19.Pipeline2.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 19 Pipelining,
Microprocessor Design
ECE 232 L13. Control.1 ©UCB, DAP’ 97 ECE 232 Hardware Organization and Design Lecture 13 Control Design
CS152 / Kubiatowicz Lec8.1 2/22/99©UCB Spring 1999 CS152 Computer Architecture and Engineering Lecture 8 Designing Single Cycle Control Feb 22, 1999 John.
CS 61C L17 Control (1) A Carle, Summer 2006 © UCB inst.eecs.berkeley.edu/~cs61c/su06 CS61C : Machine Structures Lecture #17: CPU Design II – Control
CS151B Computer Systems Architecture Winter 2002 TuTh 2-4pm BH Instructor: Prof. Jason Cong Lecture 8 Designing a Single Cycle Control.
CS61C L26 CPU Design : Designing a Single-Cycle CPU II (1) Garcia, Fall 2006 © UCB Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 9, 2002 Topic: Pipelining Basics.
EECC550 - Shaaban #1 Lec # 4 Winter Major CPU Design Steps 1Using independent RTN, write the micro- operations required for all target.
1 Atanasoff–Berry Computer, built by Professor John Vincent Atanasoff and grad student Clifford Berry in the basement of the physics building at Iowa State.
EEM 486: Computer Architecture Lecture 3 Designing a Single Cycle Datapath.
CS61C L27 Single-Cycle CPU Control (1) Garcia, Spring 2010 © UCB inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 27 Single-cycle.
CS61C L20 Single Cycle Datapath, Control (1) Chae, Summer 2008 © UCB Albert Chae, Instructor inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture.
361 control Computer Architecture Lecture 9: Designing Single Cycle Control.
Computer ArchitectureFall 2008 © October 6th, 2008 Majd F. Sakr CS-447– Computer Architecture.
CS61C L26 CPU Design : Designing a Single-Cycle CPU II (1) Garcia, Spring 2010 © UCB inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Sep 11, 2002 Topic: Pipelining (Intermediate Concepts)
ECE 232 L12.Datapath.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 12 Datapath.
ELEN 350 Single Cycle Datapath Adapted from the lecture notes of John Kubiatowicz(UCB) and Hank Walker (TAMU)
Pipelining - II Rabi Mahapatra Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002.
CS61C L27 Single Cycle CPU Control (1) Garcia, Fall 2006 © UCB Wireless High Definition?  Several companies will be working on a “WirelessHD” standard,
Lecture 12: Pipeline Datapath Design Professor Mike Schulte Computer Architecture ECE 201.
CS3350B Computer Architecture Winter 2015 Lecture 5.6: Single-Cycle CPU: Datapath Control (Part 1) Marc Moreno Maza [Adapted.
CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor.
Computer Organization CS224 Fall 2012 Lesson 26. Summary of Control Signals addsuborilwswbeqj RegDst ALUSrc MemtoReg RegWrite MemWrite Branch Jump ExtOp.
B 0000 Pipelining ENGR xD52 Eric VanWyk Fall
CASE STUDY OF A MULTYCYCLE DATAPATH. Alternative Multiple Cycle Datapath (In Textbook) Minimizes Hardware: 1 memory, 1 ALU Ideal Memory Din Address 32.
CPE 442 multipath..1 Intro. to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Multiple Cycle Processor.
EEM 486: Computer Architecture Designing Single Cycle Control.
EEM 486: Computer Architecture Designing a Single Cycle Datapath.
CPE 442 single-cycle datapath.1 Intro. To Computer Architecture CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath.
Datapath and Control Unit Design
CS3350B Computer Architecture Winter 2015 Lecture 5.7: Single-Cycle CPU: Datapath Control (Part 2) Marc Moreno Maza [Adapted.
Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001.

Cs 152 L1 3.1 DAP Fa97,  U.CB Pipelining Lessons °Pipelining doesn’t help latency of single task, it helps throughput of entire workload °Multiple tasks.
CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)
Pipelining CS365 Lecture 9. D. Barbara Pipeline CS465 2 Outline  Today’s topic  Pipelining is an implementation technique in which multiple instructions.
EEM 486: Computer Architecture Lecture 3 Designing Single Cycle Control.
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Single-Cycle CPU Datapath & Control Part 2 Instructors: Krste Asanovic & Vladimir Stojanovic.
Single Cycle Controller Design
1 EEL-4713 Ann Gordon - Ross EEL-4713 Computer Architecture Designing a Multiple-Cycle Processor.
CS/EE 362 multipath..1 ©DAP & SIK 1995 CS/EE 362 Hardware Fundamentals Lecture 14: Designing a Multi-Cycle Datapath (Chapter 5: Hennessy and Patterson)
EEL-4713 Ann Gordon-Ross 1 EEL-4713C Computer Architecture Designing a Pipelined Processor.
Problem with Single Cycle Processor Design
CpE 442 Designing a Pipeline Processor (lect. II)
SOLUTIONS CHAPTER 4.
CS 704 Advanced Computer Architecture
CS 704 Advanced Computer Architecture
CS152 Computer Architecture and Engineering Lecture 8 Designing a Single Cycle Datapath Start: X:40.
CpE 242 Computer Architecture and Engineering Designing a Multiple Cycle Processor Start X:40.
COMS 361 Computer Organization
EECS 361 Computer Architecture Lecture 10: Designing a Multiple Cycle Processor Start X:40.
EECS 361 Computer Architecture Lecture 11: Designing a Multiple Cycle Controller Start X:40.
CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor Start X:40.
Instructors: Randy H. Katz David A. Patterson
CMCS Computer Architecture Lecture 20 Pipelined Datapath and Control April 11, CMSC411.htm Mohamed.
Alternative datapath (book): Multiple Cycle Datapath
Presentation transcript:

CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.2 Intro to Computer Architecture A Single Cycle Processor 32 ALUctr Clk busW RegWr 32 busA 32 busB 555 RwRaRb bit Registers Rs Rt Rd RegDst Extender Mux imm16 ALUSrc ExtOp Mux MemtoReg Clk Data In WrEn 32 Adr Data Memory 32 MemWr ALU Zero Instruction Fetch Unit Clk Instruction Jump Branch Imm16 Rd Main Control op ALU Control func ALUop 3 RegDst ALUSrc : Instr Zero 3

CPE 442 pipeline.3 Intro to Computer Architecture Drawbacks of this Single Cycle Processor °Long cycle time: Cycle time must be long enough for the load instruction: -PC’s Clock -to-Q + -Instruction Memory Access Time + -Register File Access Time + -ALU Delay (address calculation) + -Data Memory Access Time + -Register File Setup Time + -Clock Skew °Cycle time is much longer than needed for all other instructions. Examples: R-type instructions do not require data memory access Jump does not require ALU operation nor data memory access

CPE 442 pipeline.4 Intro to Computer Architecture Overview of a Multiple Cycle Implementation °The root of the single cycle processor’s problems: The cycle time has to be long enough for the slowest instruction °Solution: Break the instruction into smaller steps Execute each step (instead of the entire instruction) in one cycle -Cycle time: time it takes to execute the longest step -Keep all the steps to have similar length This is the essence of the multiple cycle processor °The advantages of the multiple cycle processor: Cycle time is much shorter Different instructions take different number of cycles to complete -Load takes five cycles -Jump only takes three cycles Allows a functional unit to be used more than once per instruction

CPE 442 pipeline.5 Intro to Computer Architecture Multiple Cycle Processor °MCP: A functional unit to be used more than once per instruction Ideal Memory WrAdr Din RAdr 32 Dout MemWr 32 ALU 32 ALUOp ALU Control Instruction Reg 32 IRWr 32 Reg File Ra Rw busW Rb busA 32busB RegWr Rs Rt Mux 0 1 Rt Rd PCWr ALUSelA Mux 01 RegDst Mux PC MemtoReg Extend ExtOp Mux Imm 32 << 2 ALUSelB Mux 1 0 Target 32 Zero PCWrCondPCSrcBrWr 32 IorD

CPE 442 pipeline.6 Intro to Computer Architecture Outline of Today’s Lecture °Recap and Introduction (5 minutes) °Introduction to the Concept of Pipelined Processor (15 minutes) °Pipelined Datapath and Pipelined Control (25 minutes) °How to Avoid Race Condition in a Pipeline Design? (5 minutes) °Pipeline Example: Instructions Interaction (15 minutes) °Summary (5 minutes)

CPE 442 pipeline.7 Intro to Computer Architecture Timing Diagram of a Load Instruction Clk PC Rs, Rt, Rd, Op, Func Clk-to-Q ALUctr Instruction Memory Access Time Old ValueNew Value RegWrOld ValueNew Value Delay through Control Logic busA Register File Access Time Old ValueNew Value busB ALU Delay Old ValueNew Value Old ValueNew Value Old Value ExtOpOld ValueNew Value ALUSrcOld ValueNew Value AddressOld ValueNew Value busWOld ValueNew Delay through Extender & Mux Data Memory Access Time Instruction FetchInstr Decode / Reg. Fetch AddressReg WrData Memory Register File Write Time

CPE 442 pipeline.8 Intro to Computer Architecture The Five Stages of Load °Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory °Reg/Dec: Registers Fetch and Instruction Decode °Exec: Calculate the memory address °Mem: Read the data from the Data Memory °Wr: Write the data back to the register file Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5 IfetchReg/DecExecMemWrLoad

CPE 442 pipeline.9 Intro to Computer Architecture Key Ideas Behind Pipelining °Grading the Final exam for a class of 100 students: 5 problems, five people grading the exam Each person ONLY grade one problem Pass the exam to the next person as soon as one finishes his part Assume each problem takes 12 min to grade -Each individual exam still takes 1 hour to grade -But with 5 people, all exams can be graded five times quicker °The load instruction has 5 stages: Five independent functional units to work on each stage -Each functional unit is used only once The 2nd load can start as soon as the 1st finishes its Ifetch stage Each load still takes five cycles to complete The throughput, however, is much higher

CPE 442 pipeline.10 Intro to Computer Architecture Key Ideas Behind Pipelining °Let n be number of tasks or exams (or instructions) °Let k be number of stages for each task °Let T be the time per stage °Time per task = T. k °Total Time per n tasks for non-pipelined solution = T. k. n °Total Time per n tasks for pipelined solution = T. k + T. (n-1) °Speedup = pipelined perform/ non-pipelined performance = Total Time non-pipelined/ Total Time for pipelined = k. n / k + n-1 = k approx. when n >> k Input Tasks K – stage pipeline buffer Stage 1Stage 2Stage k

CPE 442 pipeline.11 Intro to Computer Architecture Pipelining the Load Instruction °The five independent functional units in the pipeline datapath are: Instruction Memory for the Ifetch stage Register File’s Read ports (bus A and busB) for the Reg/Dec stage ALU for the Exec stage Data Memory for the Mem stage Register File’s Write port (bus W) for the Wr stage °One instruction enters the pipeline every cycle One instruction comes out of the pipeline (complete) every cycle The “Effective” Cycles per Instruction (CPI) is 1 Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7 IfetchReg/DecExecMemWr1st lw IfetchReg/DecExecMemWr2nd lw IfetchReg/DecExecMemWr3rd lw

CPE 442 pipeline.12 Intro to Computer Architecture The Four Stages of R-type °Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory °Reg/Dec: Registers Fetch and Instruction Decode °Exec: ALU operates on the two register operands °Wr: Write the ALU output back to the register file Cycle 1Cycle 2Cycle 3Cycle 4 IfetchReg/DecExecWrR-type

CPE 442 pipeline.13 Intro to Computer Architecture Pipelining the R-type and Load Instruction °We have a problem: Two instructions try to write to the register file at the same time! Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecExecWrR-type IfetchReg/DecExecWrR-type IfetchReg/DecExecMemWrLoad IfetchReg/DecExecWrR-type IfetchReg/DecExecWrR-type Ops! We have a problem!

CPE 442 pipeline.14 Intro to Computer Architecture Important Observation °Each functional unit can only be used once per instruction °Each functional unit must be used at the same stage for all instructions: Load uses Register File’s Write Port during its 5th stage R-type uses Register File’s Write Port during its 4th stage IfetchReg/DecExecMemWrLoad IfetchReg/DecExecWrR-type 1234

CPE 442 pipeline.15 Intro to Computer Architecture Solution 1: Insert “Bubble” into the Pipeline °Insert a “bubble” into the pipeline to prevent 2 writes at the same cycle The control logic can be complex °No instruction is completed during Cycle 5: The “Effective” CPI for load is 2 Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecExecWrR-type IfetchReg/DecExec IfetchReg/DecExecMemWrLoad IfetchReg/DecExecWr R-type IfetchReg/DecExecWr R-type Pipeline Bubble IfetchReg/DecExecWr

CPE 442 pipeline.16 Intro to Computer Architecture Solution 2: Delay R-type’s Write by One Cycle °Delay R-type’s register write by one cycle: Now R-type instructions also use Reg File’s write port at Stage 5 Mem stage is a NOOP stage: nothing is being done Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecMemWrR-type IfetchReg/DecMemWrR-type IfetchReg/DecExecMemWrLoad IfetchReg/DecMemWrR-type IfetchReg/DecMemWrR-type IfetchReg/DecExecWrR-typeMem Exec 12345

CPE 442 pipeline.17 Intro to Computer Architecture The Four Stages of Store °Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory °Reg/Dec: Registers Fetch and Instruction Decode °Exec: Calculate the memory address °Mem: Write the data into the Data Memory Cycle 1Cycle 2Cycle 3Cycle 4 IfetchReg/DecExecMemStoreWr

CPE 442 pipeline.18 Intro to Computer Architecture The Four Stages of Beq °Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory °Reg/Dec: Registers Fetch and Instruction Decode °Exec: ALU compares the two register operands Adder calculates the branch target address °Mem: If the registers we compared in the Exec stage are the same, Write the branch target address into the PC Cycle 1Cycle 2Cycle 3Cycle 4 IfetchReg/DecExecMemBeqWr

CPE 442 pipeline.19 Intro to Computer Architecture A Pipelined Datapath Clk IfetchReg/DecExecMemWr

CPE 442 pipeline.20 Intro to Computer Architecture The Instruction Fetch Stage IF/ID: lw $1, 100 ($2) ID/Ex Register Ex/Mem Register Mem/Wr Register PC = 14 Data Mem WA Di RADo IUnit A I RFile Di Ra Rb Rw MemWr RegWr ExtOp Exec Unit busA busB Imm16 ALUOp ALUSrc Mux 1 0 MemtoReg 1 0 RegDst Rt Rd Imm16 PC+4 Rs Rt PC+4 Zero Branch 1 0 Clk IfetchReg/DecExecMem You are here! °Location 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]

CPE 442 pipeline.21 Intro to Computer Architecture A Detail View of the Instruction Unit °Location 10: lw $1, 0x100($2) IF/ID: lw $1, 100 ($2) PC = Adder Instruction Memory “4” Instruction Address Clk Ifetch You are here! Reg/Dec

CPE 442 pipeline.22 Intro to Computer Architecture The Decode / Register Fetch Stage IF/ID: ID/Ex: Reg. 2 & 0x100 Ex/Mem Register Mem/Wr Register PC Data Me m WA Di RADo IUnit A I RFile Di Ra Rb Rw MemWr RegWr ExtOp Exec Unit busA busB Imm16 ALUOp ALUSrc Mux 1 0 MemtoReg 1 0 RegDst Rt Rd Imm16 PC+4 Rs Rt PC+4 Zero Branch 1 0 Clk IfetchReg/DecExecMem You are here! °Location 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]

CPE 442 pipeline.23 Intro to Computer Architecture Load’s Address Calculation Stage IF/ID: ID/Ex Register Ex/Mem: Load’s Address Mem/Wr Register PC Data Me m WA Di RADo IUnit A I RFile Di Ra Rb Rw MemWr RegWr ExtOp=1 Exec Unit busA busB Imm16 ALUOp=Add ALUSrc=1 Mux 1 0 MemtoReg 1 0 RegDst=0 Rt Rd Imm16 PC+4 Rs Rt PC+4 Zero Branch 1 0 Clk IfetchReg/DecExecMem You are here! °Location 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]

CPE 442 pipeline.24 Intro to Computer Architecture A Detail View of the Execution Unit ID/Ex Register Ex/Mem: Load’s Memory Address ALU Control ALUctr 32 busA 32 busB Extender Mux 16 imm16 ALUSrc=1 ExtOp=1 3 ALU Zero ALUout 32 Adder 3 ALUOp=Add << 2 32 PC+4 Target 32 Clk Exec You are here! Mem

CPE 442 pipeline.25 Intro to Computer Architecture Load’s Memory Access Stage IF/ID: ID/Ex Register Ex/Mem Register Mem/Wr: Load’s Data PC Data Me m WA Di RADo IUnit A I RFile Di Ra Rb Rw MemWr=0 RegWr ExtOp Exec Unit busA busB Imm16 ALUOp ALUSrc Mux 1 0 MemtoReg 1 0 RegDst Rt Rd Imm16 PC+4 Rs Rt PC+4 Zero Branch=0 1 0 Clk IfetchReg/DecExecMem You are here! °Location 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]

CPE 442 pipeline.26 Intro to Computer Architecture Load’s Write Back Stage IF/ID: ID/Ex Register Ex/Mem Register Mem/Wr Register PC Data Me m WA Di RADo IUnit A I RFile Di Ra Rb Rw MemWr RegWr=1 ExtOp Exec Unit busA busB Imm16 ALUOp ALUSrc Mux 1 0 MemtoReg=1 1 0 RegDst Rt Rd Imm16 PC+4 Rs Rt PC+4 Zero Branch 1 0 Clk IfetchReg/DecExecMem You are somewhere out there! °Location 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100] Wr

CPE 442 pipeline.27 Intro to Computer Architecture How About Control Signals? IF/ID: ID/Ex Register Ex/Mem: Load’s Address Mem/Wr Register PC Data Me m WA Di RADo IUnit A I RFile Di Ra Rb Rw MemWr RegWr ExtOp=1 Exec Unit busA busB Imm16 ALUOp=Add ALUSrc=1 Mux 1 0 MemtoReg 1 0 RegDst=0 Rt Rd Imm16 PC+4 Rs Rt PC+4 Zero Branch 1 0 IfetchReg/DecExecMem °Key Observation: Control Signals at Stage N = Func (Instr. at Stage N) N = Exec, Mem, or Wr °Example: Controls Signals at Exec Stage = Func(Load’s Exec) Wr

CPE 442 pipeline.28 Intro to Computer Architecture Pipeline Control °The Main Control generates the control signals during Reg/Dec Control signals for Exec (ExtOp, ALUSrc,...) are used 1 cycle later Control signals for Mem (MemWr Branch) are used 2 cycles later Control signals for Wr (MemtoReg MemWr) are used 3 cycles later IF/ID Register ID/Ex Register Ex/Mem Register Mem/Wr Register Reg/DecExecMem ExtOp ALUOp RegDst ALUSrc Branch MemWr MemtoReg RegWr Main Control ExtOp ALUOp RegDst ALUSrc MemtoReg RegWr MemtoReg RegWr MemtoReg RegWr Branch MemWr Branch MemWr Wr

CPE 442 pipeline.29 Intro to Computer Architecture Beginning of the Wr’s Stage: A Real World Problem °At the beginning of the Wr stage, we have a problem if: RegAdr’s (Rd or Rt) Clk-to-Q > RegWr’s Clk-to-Q °Similarly, at the beginning of the Mem stage, we have a problem if: WrAdr’s Clk-to-Q > MemWr’s Clk-to-Q °We have a race condition between Address and Write Enable! Ex/Mem Mem/Wr RegAdr RegWrMemWr Data WrAdr Data Reg File Data Memory Clk RegAdr RegWr RegWr’s Clk-to-Q RegAdr’s Clk-to-Q Clk WrAdr MemWr MemWr’s Clk-to-Q WrAdr’s Clk-to-Q

CPE 442 pipeline.30 Intro to Computer Architecture The Pipeline Problem °Multiple Cycle design prevents race condition between Addr and WrEn: Make sure Address is stable by the end of Cycle N Asserts WrEn during Cycle N + 1 °This approach can NOT be used in the pipeline design because: Must be able to write the register file every cycle Must be able write the data memory every cycle Clock IfetchReg/DecExecMemWrStore IfetchReg/DecExecMemWrStore IfetchReg/DecExecMemWrR-type IfetchReg/DecExecMemWrR-type

CPE 442 pipeline.31 Intro to Computer Architecture Synchronize Register File & Synchronize Memory °Solution: And the Write Enable signal with the Clock This is the ONLY place where gating the clock is used MUST consult circuit expert to ensure no timing violation: -Example: Clock High Time > Write Access Delay WrEn I_Addr I_Data Reg File or Memory Clk I_Addr I_WrEn Address Data I_WrEn C_WrEn Clk Address Data WrEn Reg File or Memory Synchronize Memory and Register File Address, Data, and WrEn must be stable at least 1 set-up time before the Clk edge Write occurs at the cycle following the clock edge that captures the signals

CPE 442 pipeline.32 Intro to Computer Architecture A More Extensive Pipelining Example Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8 IfetchReg/DecExecMemWr0: Load IfetchReg/DecExecMemWr4: R-type IfetchReg/DecExecMemWr8: Store IfetchReg/DecExecMemWr12: Beq (target is 1000) End of Cycle 4 End of Cycle 5 End of Cycle 6 End of Cycle 7 °End of Cycle 4: Load’s Mem, R-type’s Exec, Store’s Reg, Beq’s Ifetch °End of Cycle 5: Load’s Wr, R-type’s Mem, Store’s Exec, Beq’s Reg °End of Cycle 6: R-type’s Wr, Store’s Mem, Beq’s Exec °End of Cycle 7: Store’s Wr, Beq’s Mem

CPE 442 pipeline.33 Intro to Computer Architecture Pipelining Example: End of Cycle 4 °0: Load’s Mem 4: R-type’s Exec 8: Store’s Reg 12: Beq’s Ifetch IF/ID: Beq Instruction ID/Ex: Store’s busA & B Ex/Mem: R-type’s Result Mem/Wr: Load’s Dout PC = 16 Data Me m WA Di RADo IUnit A I RFile Di Ra Rb Rw RegWr=0 ExtOp=x Exec Unit busA busB Imm16 ALUOp=R-type ALUSrc=0 Mux 1 0 MemtoReg=x 1 0 RegDst=1 Rt Rd Imm16 PC+4 Rs Rt PC+4 Zero Branch= : Beq’s Ifet 8: Store’s Reg4: R-type’s Exec0: Load’s Mem Clk MemWr=0 Clk

CPE 442 pipeline.34 Intro to Computer Architecture Pipelining Example: End of Cycle 5 °0: Lw’s Wr 4: R’s Mem 8: Store’s Exec 12: Beq’s Reg 16: R’s Ifetch IF/ID: 16 ID/Ex: Beq’s busA & B Ex/Mem: Store’s Address Mem/Wr: R-type’s Result PC = 20 Data Me m WA Di RADo IUnit A I RFile Di Ra Rb Rw RegWr=1 ExtOp=1 Exec Unit busA busB Imm16 ALUOp=Add ALUSrc=1 Mux 1 0 MemtoReg=1 1 0 RegDst=x Rt Rd Imm16 PC+4 Rs Rt PC+4 Zero Branch= : R’s Ifet 12: Beq’s Reg8: Store’s Exec4: R-type’s Mem 0: Load’s Wr Clk MemWr=0 Clk

CPE 442 pipeline.35 Intro to Computer Architecture Pipelining Example: End of Cycle 6 °4: R’s Wr 8: Store’s Mem 12: Beq’s Exec 16: R’s Reg 20: R’s Ifet IF/ID: 20 ID/Ex:R-type’s busA & B Ex/Mem: Beq’s Results Mem/Wr: Nothing for St PC = 24 Data Me m WA Di RADo IUnit A I RFile Di Ra Rb Rw RegWr=1 ExtOp=1 Exec Unit busA busB Imm16 ALUOp=Sub ALUSrc=0 Mux 1 0 MemtoReg=0 1 0 RegDst=x Rt Rd Imm16 PC+4 Rs Rt PC+4 Zero Branch= : R-type’s Ifet 16: R-type’s Reg12: Beq’s Exec8: Store’s Mem 4: R-type’s Wr Clk MemWr=1 Clk

CPE 442 pipeline.36 Intro to Computer Architecture Pipelining Example: End of Cycle 7 °8: Store’s Wr 12: Beq’s Mem 16: R’s Exec 20: R’s Reg 24: R’s Ifet IF/ID: 24 ID/Ex:R-type’s busA & B Ex/Mem: Rtype’s Results Mem/Wr:Nothing for Beq PC = 1000 Data Me m WA Di RADo IUnit A I RFile Di Ra Rb Rw RegWr=0 ExtOp=x Exec Unit busA busB Imm16 ALUOp=R-type ALUSrc=0 Mux 1 0 MemtoReg=x 1 0 RegDst=1 Rt Rd Imm16 PC+4 Rs Rt PC+4 Zero Branch= : R-type’s Ifet 20: R-type’s Reg16: R-type’s Exec12: Beq’s Mem 8: Store’s Wr Clk MemWr=0 Clk

CPE 442 pipeline.37 Intro to Computer Architecture The Delay Branch Phenomenon °Although Beq is fetched during Cycle 4: Target address is NOT written into the PC until the end of Cycle 7 Branch’s target is NOT fetched until Cycle 8 3-instruction delay before the branch take effect °This is referred to as Branch Hazard: Clever design techniques can reduce the delay to ONE instruction Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9Cycle 10Cycle 11 IfetchReg/DecExecMemWr IfetchReg/DecExecMemWr 16: R-type IfetchReg/DecExecMemWr IfetchReg/DecExecMemWr24: R-type 12: Beq (target is 1000) 20: R-type Clk IfetchReg/DecExecMemWr1000: Target of Br

CPE 442 pipeline.38 Intro to Computer Architecture The Delay Load Phenomenon °Although Load is fetched during Cycle 1: The data is NOT written into the Reg File until the end of Cycle 5 We cannot read this value from the Reg File until Cycle 6 3-instruction delay before the load take effect °This is referred to as Data Hazard: Clever design techniques can reduce the delay to ONE instruction Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8 IfetchReg/DecExecMemWrI0: Load IfetchReg/DecExecMemWrPlus 1 IfetchReg/DecExecMemWrPlus 2 IfetchReg/DecExecMemWrPlus 3 IfetchReg/DecExecMemWrPlus 4

CPE 442 pipeline.39 Intro to Computer Architecture Summary °Disadvantages of the Single Cycle Processor Long cycle time Cycle time is too long for all instructions except the Load °Multiple Clock Cycle Processor: Divide the instructions into smaller steps Execute each step (instead of the entire instruction) in one cycle °Pipeline Processor: Natural enhancement of the multiple clock cycle processor Each functional unit can only be used once per instruction If a instruction is going to use a functional unit: -it must use it at the same stage as all other instructions Pipeline Control: -Each stage’s control signal depends ONLY on the instruction that is currently in that stage

CPE 442 pipeline.40 Intro to Computer Architecture Single Cycle, Multiple Cycle, vs. Pipeline Clk Cycle 1 Multiple Cycle Implementation: IfetchRegExecMemWr Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9Cycle 10 LoadIfetchRegExecMemWr IfetchRegExecMem LoadStore Pipeline Implementation: IfetchRegExecMemWrStore Clk Single Cycle Implementation: LoadStoreWaste Ifetch R-type IfetchRegExecMemWrR-type Cycle 1Cycle 2

CPE 442 pipeline.41 Intro to Computer Architecture Where to get more information? °Everything You Need to know about Pipeline Computer: Peter Kogge, “The Architecture of Pipeline Computers,” McGraw Hill Book Company, 1981 °Some Classic References on RISC Pipelines: Manolis Katevenis, “Reduced Instruction Set Computer Architectures for VLSI,” PhD Thesis, UC Berkeley, °Other references: David. A Patterson, “Reduced Instruction Set Computers,” Communications of the ACM, January Shing Kong, “Performance, Resources, and Complexity,” PhD Thesis, UC Berkeley, 1989.