Download presentation
Presentation is loading. Please wait.
1
EECC550 - Shaaban #1 Lec # 5 Winter 2005 1-10-2006 Major CPU Design Steps 1. Analyze instruction set operations using independent RTN ISA => RTN => datapath requirements. –This provides the the required datapath components and how they are connected to meet ISA requirements. 2. Select required datapath components, connections & establish clock methodology ( e.g clock edge-triggered). 3. Assemble datapath meeting the requirements. 4. Identify and define the function of all control points or signals needed by the datapath. –Analyze implementation of each instruction to determine setting of control points that affects its operations and register transfer. 5. Design & assemble the control logic. –Hard-Wired: Finite-state machine implementation. –Microprogrammed. (Chapter 5.5)
2
EECC550 - Shaaban #2 Lec # 5 Winter 2005 1-10-2006 Single Cycle MIPS Datapath: CPI = 1, Long Clock Cycle Jump Not Included
3
EECC550 - Shaaban #3 Lec # 5 Winter 2005 1-10-2006 Drawbacks of Single-Cycle Processor 1.Long cycle time: –All instructions must take as much time as the slowest: Cycle time for load is longer than needed for all other instructions. –Real memory is not as well-behaved as idealized memory Cannot always complete data access in one (short) cycle. 2.Impossible to implement complex, variable-length instructions and complex addressing modes in a single cycle. e.g indirect memory addressing. 3.High and duplicate hardware resource requirements –Any hardware functional unit cannot be used more than once in a single cycle (e.g. ALUs). 4.Cannot pipeline (overlap) the processing of one instruction with the previous instructions. –(instruction pipelining, chapter 6).
4
EECC550 - Shaaban #4 Lec # 5 Winter 2005 1-10-2006 Abstract View of Single Cycle CPU PC Next PC Register Fetch ALU Reg. Wrt Mem Access Data Mem Instruction Fetch Result Store ALUctr RegDst ALUSrc ExtOp MemWr Equal Branch, Jump RegWr MemWr MemRd Main Control ALU control op fun Ext One CPU Clock Cycle Duration C = 8ns One instruction per cycle CPI = 1 Assuming the following datapath/control hardware components delays: Memory Units: 2 ns ALU and adders: 2 ns Register File: 1 ns Control Unit < 1 ns
5
EECC550 - Shaaban #5 Lec # 5 Winter 2005 1-10-2006 Single Cycle Instruction Timing PCInst Memory mux ALUData Mem mux PCReg FileInst Memory mux ALU mux PCInst Memory mux ALUData Mem PCInst Memorycmp mux Reg File Arithmetic & Logical Load Store Branch Critical Path setup (Determines CPU clock cycle, C)
6
EECC550 - Shaaban #6 Lec # 5 Winter 2005 1-10-2006 Clock Cycle Time & Critical Path Critical path: the slowest path between any two storage devices Clock Cycle time is a function of the critical path, and must be greater than: –Clock-to-Q + Longest Delay Path through the Combination Logic + Setup + Clock Skew Clk........................ One CPU Clock Cycle Duration C = 8ns here Critical Path Assuming the following datapath/control hardware components delays: Memory Units: 2 ns ALU and adders: 2 ns Register File: 1 ns Control Unit < 1 ns
7
EECC550 - Shaaban #7 Lec # 5 Winter 2005 1-10-2006 Reducing Cycle Time: Multi-Cycle Design Cut combinational dependency graph by inserting registers / latches. The same work is done in two or more shorter cycles, rather than one long cycle. storage element Acyclic Combinational Logic storage element Acyclic Combinational Logic (A) storage element Acyclic Combinational Logic (B) => Place registers to: Get a balanced clock cycle length Save any results needed for the remaining cycles One long cycle Two shorter cycles Cycle 1 Cycle 2 e.g CPI =1
8
EECC550 - Shaaban #8 Lec # 5 Winter 2005 1-10-2006 Basic MIPS Instruction Processing Steps Obtain instruction from program storage Determine instruction type Obtain operands from registers Compute result value or status Store result in register/memory if needed (usually called Write Back). Update program counter to address of next instruction } Common steps for all instructions Instruction Fetch Instruction Decode Execute Result Store Next Instruction Instruction Mem[PC] PC PC + 4 Done by Control Unit Instruction Memory
9
EECC550 - Shaaban #9 Lec # 5 Winter 2005 1-10-2006 Partitioning The Single Cycle Datapath Add registers between steps to break into cycles PC Next PC Operand Fetch Exec Reg. File Mem Access Data Mem Instruction Fetch Result Store ALUctr RegDst ALUSrc ExtOp MemWr Branch, Jump RegWr MemWr MemRd Instruction Fetch Cycle (IF) Instruction Decode Cycle (ID) Execution Cycle (EX) Data Memory Access Cycle (MEM) Write back Cycle (WB) 12345 Place registers to: Get a balanced clock cycle length Save any results needed for the remaining cycles
10
EECC550 - Shaaban #10 Lec # 5 Winter 2005 1-10-2006 Example Multi-cycle Datapath PC Next PC Ext ALU Reg. File Mem Acces s Data Mem ALUctr RegDst ALUSrc ExtOp Branch, Jump RegWr MemWr MemRd IR A B R M Reg File MemToReg Equal Registers added: All clock-edge triggered (not shown register write enable control lines) IR: Instruction register A, B: Two registers to hold operands read from register file. R: or ALUOut, holds the output of the main ALU M: or Memory data register (MDR) to hold data read from data memory CPU Clock Cycle Time: Worst cycle delay = C = 2ns (ignoring MUX, CLK-Q delays) Instruction Fetch (IF) 2ns Instruction Decode (ID) 1ns Execution (EX) 2ns Memory (MEM) 2ns Write Back (WB) 1ns To Control Unit Assuming the following datapath/control hardware components delays: Memory Units: 2 ns ALU and adders: 2 ns Register File: 1 ns Control Unit < 1 ns Instruction Fetch
11
EECC550 - Shaaban #11 Lec # 5 Winter 2005 1-10-2006 Operations (Dependant RTN) for Each Cycle Instruction Fetch Instruction Decode Execution Memory Write Back R-Type IR Mem[PC] A R[rs] B R[rt] R A funct B R[rd] R PC PC + 4 Logic Immediate IR Mem[PC] A R[rs] B R[rt R A OR ZeroExt[imm16] R[rt] R PC PC + 4 Load IR Mem[PC] A R[rs] B R[rt R A + SignEx(Im16) M Mem[R] R[rt] M PC PC + 4 Store IR Mem[PC] A R[rs] B R[rt] R A + SignEx(Im16) Mem[R] B PC PC + 4 Branch IR Mem[PC] A R[rs] B R[rt] Zero A - B If Zero = 1: PC PC + 4 + (SignExt(imm16) x4) else (i.e Zero =0): PC PC + 4 IF ID EX MEM WB Instruction Fetch (IF) & Instruction Decode cycles are common for all instructions
12
EECC550 - Shaaban #12 Lec # 5 Winter 2005 1-10-2006 MIPS Multi-Cycle Datapath: Five Cycles of Load Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5 IF IDEXMEMWBLoad 1- Instruction Fetch (IF): Fetch the instruction from instruction Memory. 2- Instruction Decode (ID): Operand Register Fetch and Instruction Decode. 3- Execute (EX): Calculate the effective memory address. 4- Memory (MEM): Read the data from the Data Memory. 5- Write Back (WB): Write the loaded data to the register file. Update PC.
13
EECC550 - Shaaban #13 Lec # 5 Winter 2005 1-10-2006 Multi-cycle Datapath Instruction CPI R-Type/Immediate: Require four cycles, CPI = 4 – IF, ID, EX, WB Loads: Require five cycles, CPI = 5 – IF, ID, EX, MEM, WB Stores: Require four cycles, CPI = 4 –IF, ID, EX, MEM Branches/Jumps: Require three cycles, CPI = 3 – IF, ID, EX Average or effective program CPI: 3 CPI 5 depending on program profile (instruction mix).
14
EECC550 - Shaaban #14 Lec # 5 Winter 2005 1-10-2006 Single Cycle Vs. Multi-Cycle CPU Single-Cycle CPU: CPI = 1 C = 8ns One million instructions take = I x CPI x C = 10 6 x 1 x 8x10 -9 = 8 msec Multi-Cycle CPU: CPI = 3 to 5 C = 2ns One million instructions take from 10 6 x 3 x 2x10 -9 = 6 msec to 10 6 x 5 x 2x10 -9 = 10 msec depending on instruction mix used. 8ns (125 MHz) Assuming the following datapath/control hardware components delays: Memory Units: 2 ns ALU and adders: 2 ns Register File: 1 ns Control Unit < 1 ns
15
EECC550 - Shaaban #15 Lec # 5 Winter 2005 1-10-2006 Finite State Machine (FSM) Control Model State specifies control points (outputs) for Register Transfer. Control points (outputs) are assumed to depend only on the current state and not inputs (i.e. Moore finite state machine) Transfer (register/memory writes) and state transition occur upon exiting the state on the falling edge of the clock. State X Register Transfer Control Points State Transition Depends on Inputs Control State Next State Logic Output Logic inputs (opcode, conditions) outputs (control points) Next State Last State To datapath Current State
16
EECC550 - Shaaban #16 Lec # 5 Winter 2005 1-10-2006 Control Specification For Multi-cycle CPU Finite State Machine (FSM) - State Transition Diagram IR MEM[PC] R-type A R[rs] B R[rt] R A fun B R[rd] R PC PC + 4 R A or ZX R[rt] R PC PC + 4 ORi R A + SX R[rt] M PC PC + 4 M MEM[R] LW R A + SX MEM[R] B PC PC + 4 BEQ & Zero BEQ & ~Zero PC PC + 4 PC PC + 4+ SX || 00 SW “instruction fetch” “decode / operand fetch” Execute Memory Write-back To instruction fetch 13 states: 4 State Flip-Flops needed (Start state)
17
EECC550 - Shaaban #17 Lec # 5 Winter 2005 1-10-2006 Traditional FSM Controller State 6 4 11 next State op Equal control points stateopcond next state control points Truth or Transition Table datapath State To datapath Outputs (Control points) Opcode Current State State register (4 Flip=flops) Output Logic Next State Logic
18
EECC550 - Shaaban #18 Lec # 5 Winter 2005 1-10-2006 Traditional FSM Controller datapath + state diagram => control Translate RTN statements into control points. Assign states. Implement the controller.
19
EECC550 - Shaaban #19 Lec # 5 Winter 2005 1-10-2006 Mapping RTNs To Control Points Examples & State Assignments IR MEM[PC] 0000 R-type A R[rs] B R[rt] 0001 R A fun B 0100 R[rd] R PC PC + 4 0101 R A or ZX 0110 R[rt] R PC PC + 4 0111 ORi R A + SX 1000 R[rt] M PC PC + 4 1010 M MEM[R] 1001 LW R A + SX 1011 MEM[R] B PC PC + 4 1100 BEQ & Zero BEQ & ~Zero PC PC + 4 0011 PC PC + 4+SX || 00 0010 SW “instruction fetch” “decode / operand fetch” Execute Memory Write-back imem_rd, IRen Aen, Ben ALUfun, Sen RegDst, RegWr, PCen To instruction fetch state 0000 To instruction fetch state 0000 0 1 2 3 4 57 8 9 10 11 6 12
20
EECC550 - Shaaban #20 Lec # 5 Winter 2005 1-10-2006 Detailed Control Specification - State Transition Table Current Op fieldZNext IR PC Ops Exec Mem Write-Back State en selA B Ex Sr ALU S R W MM-R Wr Dst 0000???????00011 0001BEQ000111 1 0001BEQ100101 1 0001R-typex01001 1 0001orIx01101 1 0001LWx10001 1 0001SWx10111 1 0010xxxxxxx00001 1 0011xxxxxxx00001 0 0100xxxxxxx01010 1 fun 1 0101xxxxxxx00001 0 0 1 1 0110xxxxxxx01110 0 or 1 0111xxxxxxx00001 0 0 1 0 1000xxxxxxx10011 0 add 1 1001xxxxxxx10101 0 1 1010 xxxxxxx00001 0 1 1 0 1011xxxxxxx11001 0 add 1 1100xxxxxxx0000 1 00 1 R ORI LW SW BEQ IF ID Can be combines in one state
21
EECC550 - Shaaban #21 Lec # 5 Winter 2005 1-10-2006 Alternative Multiple Cycle Datapath (In Textbook) Miminizes Hardware: 1 memory, 1 ALU Ideal Memory Din Address 32 Dout MemWr 32 ALU 32 ALUOp ALU Control 32 IRWr Instruction Reg 32 Reg File Ra Rw busW Rb 5 5 32 busA 32 busB RegWr Rs Rt Mux 0 1 Rt Rd PCWr ALUSrcA Mux 01 RegDst Mux 0 1 32 PC MemtoReg Extend Mux 0 1 32 0 1 2 3 4 16 Imm 32 ALUSrcB Mux 1 0 32 Zero PCWrCondPCSrc 32 IorD Mem Data Reg ALU Out B A << 2 MemRd
22
EECC550 - Shaaban #22 Lec # 5 Winter 2005 1-10-2006 Alternative Multiple Cycle Datapath (In Textbook) Shared instruction/data memory unit A single ALU shared among instructions Shared units require additional or widened multiplexors Temporary registers to hold data between clock cycles of the instruction: Additional registers: Instruction Register (IR), Memory Data Register (MDR), A, B, ALUOut (Figure 5.27 page 322) rs rt rd imm16
23
EECC550 - Shaaban #23 Lec # 5 Winter 2005 1-10-2006 Alternative Multiple Cycle Datapath With Control Lines (Fig 5.28 In Textbook) (ORI not supported, Jump supported) PC+ 4 Branch Target rs rt rd 2 2 2 (Figure 5.28 page 323) imm16
24
EECC550 - Shaaban #24 Lec # 5 Winter 2005 1-10-2006 The Effect of The 1-bit Control Signals Signal Name RegDst RegWrite ALUSrcA MemRead MemWrite MemtoReg IorD IRWrite PCWrite PCWriteCond Effect when deasserted (=0) The register destination number for the write register comes from the rt field (instruction bits 20:16). None The first ALU operand is the PC None The value fed to the register write data input comes from ALUOut register. The PC is used to supply the address to the memory unit. None Effect when asserted (=1) The register destination number for the write register comes from the rd field (instruction bits 15:11). The register on the write register input is written with the value on the Write data input. The First ALU operand is register A (I.e R[rs]) Content of memory specified by the address input are put on the memory data output. Memory contents specified by the address input is replaced by the value on the Write data input. The value fed to the register write data input comes from data memory register (MDR). The ALUOut register is used to supply the the address to the memory unit. The output of the memory is written into Instruction Register (IR) The PC is written; the source is controlled by PCSource The PC is written if the Zero output of the ALU is also active. (Figure 5.29 page 324)
25
EECC550 - Shaaban #25 Lec # 5 Winter 2005 1-10-2006 The Effect of The 2-bit Control Signals Signal Name ALUOp ALUSrcB PCSource Value (Binary) 00 01 10 00 01 10 11 00 01 10 Effect The ALU performs an add operation The ALU performs a subtract operation The funct field of the instruction determines the ALU operation (R-Type) The second input of the ALU comes from register B The second input of the ALU is the constant 4 The second input of the ALU is the sign-extended 16-bit immediate field of the instruction in IR The second input of the ALU is is the sign-extended 16-bit immediate field of IR shifted left 2 bits Output of the ALU (PC+4) is sent to the PC for writing The content of ALUOut (the branch target address) is sent to the PC for writing The jump target address (IR[25:0] shifted left 2 bits and concatenated with PC+4[31:28] is sent to the PC for writing (Figure 5.29 page 324)
26
EECC550 - Shaaban #26 Lec # 5 Winter 2005 1-10-2006 Instruction Fetch Instruction Decode Execution Memory Write Back R-Type IR Mem[PC] PC PC + 4 A R[rs] B R[rt] ALUout PC + (SignExt(imm16) x4) ALUout A funct B R[rd] ALUout Load IR Mem[PC] PC PC + 4 A R[rs] B R[rt] ALUout PC + (SignExt(imm16) x4) ALUout A + SignEx(Im16) M Mem[ALUout] R[rt] M Store IR Mem[PC] PC PC + 4 A R[rs] B R[rt] ALUout PC + (SignExt(imm16) x4) ALUout A + SignEx(Im16) Mem[ALUout] B Branch IR Mem[PC] PC PC + 4 A R[rs] B R[rt] ALUout PC + (SignExt(imm16) x4) Zero A - B Zero: PC ALUout Jump IR Mem[PC] PC PC + 4 A R[rs] B R[rt] ALUout PC + (SignExt(imm16) x4) PC Jump Address IF ID EX MEM WB Instruction Fetch (IF) & Instruction Decode cycles are common for all instructions Operations (Dependant RTN) for Each Cycle
27
EECC550 - Shaaban #27 Lec # 5 Winter 2005 1-10-2006 High-Level View of Finite State Machine Control First steps are independent of the instruction class Then a series of sequences that depend on the instruction opcode Then the control returns to fetch a new instruction. Each box above represents one or several state. (Figure 5.32) (Figure 5.33) (Figure 5.34)(Figure 5.35)(Figure 5.36) (Figure 5.31 page 332)
28
EECC550 - Shaaban #28 Lec # 5 Winter 2005 1-10-2006 Instruction Fetch (IF) and Decode (ID) FSM States IF ID (Figure 5.33)(Figure 5.34)(Figure 5.35)(Figure 5.36) (Figure 5.32 page 333) IR Mem[PC] PC PC + 4 A R[rs] B R[rt] ALUout PC + (SignExt(imm16) x4)
29
EECC550 - Shaaban #29 Lec # 5 Winter 2005 1-10-2006 Load/Store Instructions FSM States EX MEM WB To Instruction Fetch (Figure 5.32) (From Instruction Decode) (Figure 5.33 page 334) ALUout A + SignEx(Im16) M Mem[ALUout] Mem[ALUout] B R[rt] M
30
EECC550 - Shaaban #30 Lec # 5 Winter 2005 1-10-2006 R-Type Instructions FSM States EX WB To State 0 (Instruction Fetch) (Figure 5.32) (From Instruction Decode) (Figure 5.34 page 335) ALUout A funct B R[rd] ALUout
31
EECC550 - Shaaban #31 Lec # 5 Winter 2005 1-10-2006 Jump Instruction Single EX State Branch Instruction Single EX State EX To State 0 (Instruction Fetch) (Figure 5.32) (From Instruction Decode) To State 0 (Instruction Fetch) (Figure 5.32) (From Instruction Decode) (Figures 5.35, 5.36 page 337) PC Jump Address Zero A - B Zero : PC ALUout
32
EECC550 - Shaaban #32 Lec # 5 Winter 2005 1-10-2006 FSM State Transition Diagram (From Book) IF ID EX MEM WB (Figure 5.38 page 339)
33
EECC550 - Shaaban #33 Lec # 5 Winter 2005 1-10-2006 MIPS Multi-cycle Datapath Performance Evaluation What is the average CPI? –State diagram gives CPI for each instruction type. –Workload (program) below gives frequency of each type. TypeCPI i for typeFrequency CPI i x freqI i Arith/Logic 440%1.6 Load 5 30%1.5 Store 410%0.4 branch 320%0.6 Average CPI: 4.1 Better than CPI = 5 if all instructions took the same number of clock cycles (5).
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.