Presentation is loading. Please wait.

Presentation is loading. Please wait.

CPU Organization (Design)

Similar presentations


Presentation on theme: "CPU Organization (Design)"— Presentation transcript:

1 CPU Organization (Design)
Datapath Design: Capabilities & performance characteristics of principal Functional Units (FUs) needed by ISA instructions (e.g., Registers, ALU, Shifters, Logic Units, ...) Ways in which these components are interconnected (buses connections, multiplexors, etc.). How information flows between components. Control Unit Design: Logic and means by which such information flow is controlled. Control and coordination of FUs operation to realize the targeted Instruction Set Architecture to be implemented (can either be implemented using a finite state machine or a microprogram). Hardware description with a suitable language, possibly using Register Transfer Notation (RTN). Components & their connections needed by ISA instructions Components Connections Control/sequencing of operations of datapath components to realize ISA instructions Chapter

2 Major CPU Design Steps Analyze instruction set to get datapath requirements: Using independent RTN, write the micro-operations required for target ISA instructions. This provides the the required datapath components and how they are connected. Select set of datapath components and establish clocking methodology (defines when storage or state elements can read and when they can be written, e.g clock edge-triggered) Assemble datapath meeting the requirements. Identify and define the function of all control points or signals needed by the datapath. Analyze implementation of each instruction to determine setting of control points that affects its operations. Control unit design, based on micro-operation timing and control signals identified: Combinational logic: For single cycle CPU. Hard-Wired: Finite-state machine implementation. Microprogrammed. 1 2 e.g Any instruction completed in one cycle

3 CPU Design & Implantation Process
Top-down Design: Specify component behavior from high-level requirements (ISA). Bottom-up Design: Assemble components in target technology to establish critical timing (hardware delays, critical path timing). Iterative refinement: Establish a partial solution, expand and improve. Instruction Set Architecture (ISA): Provides Requirements Processor Datapath Control Reg. File Mux ALU Reg Mem Decoder Sequencer Target VLSI implementation Technology Cells Gates

4 Datapath Design Steps Write the micro-operation sequences required for a number of representative target ISA instructions using independent RTN. Independent RTN statements specify: the required datapath components and how they are connected. From the above, create an initial datapath by determining possible destinations for each data source (i.e registers, ALU). This establishes connectivity requirements (data paths, or connections) for datapath components. Whenever multiple sources are connected to a single input, a multiplexor of appropriate size is added. Find the worst-time propagation delay in the datapath to determine the datapath clock cycle (CPU clock cycle). Complete the micro-operation sequences for all remaining instructions adding datapath components + connections/multiplexors as needed. 1 2 (or destination)

5 MIPS Instruction Formats
op rs rt rd shamt funct 6 11 16 21 26 31 6 bits 5 bits R-Type [31:26] [25:21] [20:16] [15:11] [10:6] [5:0] op rs rt Immediate (imm16) 16 21 26 31 6 bits 16 bits 5 bits I-Type: ALU Load/Store, Branch [31:26] [25:21] [20:16] [15:0] op target address 26 31 6 bits 26 bits J-Type: Jumps [31:26] [25:0] op: Opcode, operation of the instruction. rs, rt, rd: The source and destination register specifiers. shamt: Shift amount. funct: Selects the variant of the operation in the “op” field. address / immediate: Address offset or immediate value. target address: Target address of the jump instruction.

6 MIPS R-Type (ALU) Instruction Fields
R-Type: All ALU instructions that use three registers 1st operand 2nd operand Destination OP rs rt rd shamt funct 6 bits bits bits bits bits bits [31:26] [25:21] [20:16] [15:11] [10:6] [5:0] op: Opcode, basic operation of the instruction. For R-Type op = 0 rs: The first register source operand. rt: The second register source operand. rd: The register destination operand. shamt: Shift amount used in constant shift operations. funct: Function, selects the specific variant of operation in the op field. Rs, rt , rd are register specifier fields Independent RTN: Instruction Word ¬ Mem[PC] R[rd] ¬ R[rs] funct R[rt] PC ¬ PC + 4 Funct field value examples: Add = 32 Sub = 34 AND = 36 OR =37 NOR = 39 Operand register in rs Destination register in rd Operand register in rt Examples: add $1,$2,$3 sub $1,$2,$3 and $1,$2,$3 or $1,$2,$3 R-Type = Register Type Register Addressing used (Mode 1)

7 MIPS ALU I-Type Instruction Fields
I-Type ALU instructions that use two registers and an immediate value Loads/stores, conditional branches. 1st operand Destination 2nd operand OP rs rt Immediate (imm16) 6 bits bits bits bits imm16 [31:26] [25:21] [20:16] [15:0] Independent RTN for addi: op: Opcode, operation of the instruction. rs: The register source operand. rt: The result destination register. immediate: Constant second operand for ALU instruction. Instruction Word ¬ Mem[PC] R[rt] ¬ R[rs] + imm16 PC ¬ PC + 4 Source operand register in rs Result register in rt OP = 8 add immediate: addi $1,$2,100 and immediate andi $1,$2,10 Examples: Constant operand in immediate OP = 12 I-Type = Immediate Type Immediate Addressing used (Mode 2) imm16 = 16 bit immediate field

8 MIPS Load/Store I-Type Instruction Fields
Base Src./Dest. OP rs rt address 6 bits bits bits bits (e.g. offset) imm16 [31:26] [25:21] [20:16] [15:0] Signed address offset in bytes op: Opcode, operation of the instruction. For load word op = 35, for store word op = 43. rs: The register containing memory base address. rt: For loads, the destination register. For stores, the source register of value to be stored. address: 16-bit memory address offset in bytes added to base register. base register in rs Examples: source register in rt Offset Instruction Word ¬ Mem[PC] Mem[R[rs] + imm16] ¬ R[rt] PC ¬ PC + 4 Store word: sw $3, 500($4) Load word: lw $1, 32($2) Instruction Word ¬ Mem[PC] R[rt] ¬ Mem[R[rs] + imm16] PC ¬ PC + 4 base register in rs Destination register in rt Offset Base or Displacement Addressing used (Mode 3) imm16 = 16 bit immediate field

9 MIPS Branch I-Type Instruction Fields
OP rs rt address (e.g. offset) 6 bits bits bits bits imm16 [31:26] [25:21] [20:16] [15:0] Signed address offset in words op: Opcode, operation of the instruction. rs: The first register being compared rt: The second register being compared. address: 16-bit memory address branch target offset in words added to PC to form branch address. Word = 4 bytes Register in rt offset in bytes equal to instruction address field x 4 Register in rs OP = 4 Branch on equal beq $1,$2,100 Branch on not equal bne $1,$2,100 Examples: Added to PC+4 to form branch target OP = 5 Independent RTN for beq: Instruction Word ¬ Mem[PC] R[rs] = R[rt] : PC ¬ PC imm16 x 4 R[rs] ¹ R[rt] : PC ¬ PC + 4 PC-Relative Addressing used (Mode 4) imm16 = 16 bit immediate field

10 MIPS J-Type Instruction Fields
J-Type: Include jump j, jump and link jal OP jump target 6 bits bits Jump target in words [31:26] [25:0] Word = 4 bytes op: Opcode, operation of the instruction. Jump j op = 2 Jump and link jal op = 3 jump target: jump memory address in words. Jump memory address in bytes equal to instruction field jump target x 4 Jump j Jump and link jal Examples: Effective 32-bit jump address: PC(31-28),jump_target,00 PC(31-28) From PC+4 jump target = 2500 4 bits bits bits 0 0 Independent RTN for j: Instruction Word ¬ Mem[PC] PC ¬ PC + 4 PC ¬ PC(31-28),jump_target,00 J-Type = Jump Type Pseudodirect Addressing used (Mode 5)

11 A Subset of MIPS Instructions
32 = add 34 = sub ADD and SUB: add rd, rs, rt sub rd, rs, rt OR Immediate: ori rt, rs, imm16 LOAD and STORE Word lw rt, rs, imm16 sw rt, rs, imm16 BRANCH: beq rs, rt, imm16 op rs rt rd shamt funct 6 11 16 21 26 31 6 bits 5 bits R [31:26] [25:21] [20:16] [15:11] [10:6] [5:0] I op rs rt Immediate (imm16) 16 21 26 31 6 bits 16 bits 5 bits 13 [31:26] [25:21] [20:16] [15:0] I op rs rt Immediate (imm16) 16 21 26 31 6 bits 16 bits 5 bits Offset in bytes 35 = lw 43 = sw [31:26] [25:21] [20:16] [15:0] I op rs rt Immediate (imm16) 16 21 26 31 6 bits 16 bits 5 bits Offset in words 4 [31:26] [25:21] [20:16] [15:0]

12 Basic MIPS Instruction Processing Steps
Instruction Memory Instruction Fetch Decode Execute Result Store Next } Obtain instruction from program storage Instruction ¬ Mem[PC] Update program counter to address of next instruction Common steps for all instructions PC ¬ PC + 4 Determine instruction type Obtain operands from registers Done by Control Unit (Based on Opcode) Compute result value or status Store result in register/memory if needed (usually called Write Back).

13 Overview of MIPS Instruction Micro-operations
All instructions go through these common steps: Send program counter to instruction memory and fetch the instruction. (fetch) Instruction ¬ Mem[PC] Update the program counter to point to next instruction PC ¬ PC + 4 Read one or two registers, using instruction fields. (decode) Load reads one register only. Additional instruction execution actions (execution) depend on the instruction in question, but similarities exist: All instruction classes (except J type) use the ALU after reading the registers: Memory reference instructions use it for address calculation. Arithmetic and logic instructions (R-Type), use it for the specified operation. Branches use it for comparison. Additional execution steps where instruction classes differ: Memory reference instructions: Access memory for a load or store. Arithmetic and logic instructions: Write ALU result back in register. Branch instructions: Change next instruction address based on comparison.

14 A Single Cycle MIPS CPU Design
Design target: A single-cycle per instruction MIPS CPU design All micro-operations of an instruction are to be carried out in a single CPU clock cycle. Cycles Per Instruction = CPI = 1 T = I x CPI x C CPU Performance Equation: Abstract view of single cycle MIPS CPU showing major functional units (components) and major connections between them CPI = 1 Figure 5.1 page 287

15 R-Type Example: Micro-Operation Sequence For ADD
add rd, rs, rt 32 = add 34 = sub OP rs rt rd shamt funct 6 bits bits bits bits bits bits [31:26] [25:21] [20:16] [15:11] [10:6] [5:0] Instruction Word ¬ Mem[PC] Fetch the instruction PC ¬ PC Increment PC R[rd] ¬ R[rs] + R[rt] Add register rs to register rt result in register rd Common Steps Program Memory i.e Funct =add Independent RTN ?

16 Initial Datapath Components
Three components needed by: Instruction Fetch: Instruction ¬ Mem[PC] Program Counter Update: PC ¬ PC + 4 Instruction Word 32 Two state elements (memory) needed to store and access instructions: Instruction memory: Only read access (by user code). No read control signal needed. Program counter (PC): 32-bit register. Written at end of every clock cycle (edge-triggered) : No write control signal. 32-bit Adder: To compute the the next instruction address (PC + 4). Basics of logic design/logic building blocks review in Appendix B (Book CD)

17 More Datapath Components
ISA Register File Main 32-bit ALU 4 (Function) 32 32 32 32 32 32-bit Arithmetic and Logic Unit (ALU) Register File: Contains all ISA registers. Two read ports and one write port. Register writes by asserting write control signal Clocking Methodology: Writes are edge-triggered. Thus can read and write to the same register in the same clock cycle. Zero = Zero flag = 1 When ALU result equals zero Basics of logic design/logic building blocks review in Appendix B (Book CD)

18 Register File Details 5 Register File consists of 32 registers:
Clk busW Write Enable 32 busA busB 5 RW RA RB 32 32-bit Registers Register File consists of 32 registers: Two 32-bit output busses: busA and busB One 32-bit input bus: busW Register is selected by: RA (number) selects the register to put on busA (data): busA = R[RA] RB (number) selects the register to put on busB (data): busB = R[RB] RW (number) selects the register to be written via busW (data) when Write Enable is 1 Write Enable: R[RW] ¬ busW Clock input (CLK) The CLK input is a factor ONLY during write operations. During read operation, it behaves as a combinational logic block: RA or RB valid => busA or busB valid after “access time.”

19 A Possible Register File Implementation
5-to-32 Decoder Register 0 Write Data In Data Out Register 1 Register 30 Register 31 . 32-to-1 MUX 1 30 31 32 5 Register Read Data 1 (Bus A) Read Data 2 (Bus B) Read Register 1 (RA) Read Register 2 (RB) Register Write Data (Bus W) Register Write Enable (RegWrite) RW Clk busW Write Enable busA busB RA RB 32 32-bit Registers Each Register contains 32 edge triggered D-Flip Flops Clock input to registers not shown in diagram Also see Appendix B (Book CD) - The Basics of Logic Design

20 Idealized Memory Memory (idealized) One input bus: Data In.
Clk Data In Write Enable 32 DataOut Address Read Enable Memory (idealized) One input bus: Data In. One output bus: Data Out. Memory word is selected by: Address selects the word to put on Data Out bus. Write Enable = 1: address selects the memory word to be written via the Data In bus. Clock input (CLK): The CLK input is a factor ONLY during write operation, During read operation, this memory behaves as a combinational logic block: Address valid => Data Out valid after “access time.” Ideal Memory = Short access time. Compared to other components

21 Clocking Methodology Used: Edge Triggered Writes
Clk Setup Hold Setup Hold Don’t Care CLK-to-Q CLK-to-Q . . Critical Path (Longest delay path) All storage element (e.g Flip-Flops, Registers, Data Memory) writes are triggered by the same clock edge. Cycle Time = CLK-to-Q + Longest Delay Path + Setup + Clock Skew Here writes are triggered on the rising edge of the clock

22 Building The Datapath Instruction Fetch & PC Update: PC ¬ PC + 4
32 32 32 Instruction ¬ Mem[PC] Portion of the datapath used for fetching instructions and incrementing the program counter. 32 Clock input to PC, memory not shown PC write or update is edge triggered at the end of the cycle Figure 5.6 page 293

23 Simplified Datapath For MIPS R-Type Instructions
rs [25:21] [20:16] [15:11] 4 From Instruction Memory R[rs] (Function) rt 32 rd R[rt] 32 32 32 Components and connections as specified by RTN statement R[rd] ¬ R[rs] + R[rt] Clock input to register bank not shown i.e Funct = function =add Destination register R[rd] write or update is edge triggered at the end of the cycle

24 More Detailed Datapath For R-Type Instructions With Control Points Identified
32 Result ALUctr Clk busW RegWr busA busB 5 Rw Ra Rb 32 32-bit Registers Rs Rt Rd ALU Function =Add, Subtract … R[rs] R[rt] R[rd] ¬ R[rs] + R[rt] i.e Funct = function =add

25 R-Type Register-Register Timing
PC+4 Clk PC Rs, Rt, Rd, Op, Func Clk-to-Q ALUctr Instruction Memory Access Time Old Value New Value RegWr Delay through Control Logic busA, B Register File Access Time busW ALU Delay Register Write Occurs Here 32 Result ALUctr Clk busW RegWr busA busB 5 Rw Ra Rb 32 32-bit Registers Rs Rt Rd ALU All register writes occur on falling edge of clock (clocking methodology)

26 Logical Operations with Immediate Example: Micro-Operation Sequence For ORI
ori rt, rs, imm16 op rs rt Immediate (imm16) 16 21 26 31 6 bits 16 bits 5 bits 13 [31:26] [25:21] [20:16] [15:0] Instruction Word ¬ Mem[PC] Fetch the instruction PC ¬ PC Increment PC R[rt] ¬ R[rs] OR ZeroExt[imm16] OR register rs with immediate field zero extended to 32 bits, result in register rt Done by Main ALU

27 Datapath For Logical Instructions With Immediate
32 Result ALUctr Clk busW RegWr busA busB 5 Rw Ra Rb 32 32-bit Registers Rs Rt Rt Rd RegDst ZeroExt Mux 16 imm16 ALUSrc ALU 1 2x1 MUX (width 5 bits) Function = OR R[rs] R[rt] 2x1 MUX (width 32 bits) R[rt] ¬ R[rs] OR ZeroExt[imm16]

28 Load Operations Example: Micro-Operation Sequence For LW
lw rt, rs, imm16 op rs rt Immediate (imm16) 16 21 26 31 6 bits 16 bits 5 bits 35 Address offset in bytes [31:26] [25:21] [20:16] [15:0] Instruction Word ¬ Mem[PC] Fetch the instruction PC ¬ PC Increment PC R[rt] ¬ Mem[R[rs] + SignExt[imm16]] Immediate field sign extended to bits and added to register rs to form memory load address, write word at load effective address to register rt Instruction Memory Effective Address Data Memory

29 Additional Datapath Components For Loads & Stores
32 For SignExt[imm16] Inputs: for address and write (store) data Output for read (load) data 16-bit input sign-extended into a 32-bit value at the output Data memory write or update is edge triggered at the end of the cycle (clocking methodology)

30 Datapath For Loads R[rt] ¬ Mem[R[rs] + SignExt[imm16]] 32 ALUctr Clk
busW RegWr busA busB 5 Rw Ra Rb 32 32-bit Registers Rs Rt Rd RegDst Extender Mux 16 imm16 ALUSrc ExtOp Data In WrEn Adr Data Memory ALU MemWr MemtoReg 1 R[rs] R[rt] Base Address register Effective Address Offset Function = add MemRd R[rt] ¬ Mem[R[rs] + SignExt[imm16]] Data Memory Effective Address

31 Store Operations Example: Micro-Operation Sequence For SW
sw rt, rs, imm16 op rs rt Immediate (imm16) 16 21 26 31 6 bits 16 bits 5 bits 43 Address offset in bytes [31:26] [25:21] [20:16] [15:0] Instruction Word ¬ Mem[PC] Fetch the instruction PC ¬ PC Increment PC Mem[R[rs] + SignExt[imm16]] ¬ R[rt] Immediate field sign extended to bits and added to register rs to form memory store effective address, register rt written to memory at store effective address. Effective Address Data Memory

32 Datapath For Stores Mem[R[rs] + SignExt[imm16]] ¬ R[rt] ALUSrc ExtOp
32 ALUctr Clk busW RegWr busA busB 5 Rw Ra Rb 32 32-bit Registers Rs Rt Rd RegDst Extender Mux 16 imm16 Data In WrEn Adr Data Memory MemWr ALU MemtoReg 1 R[rs] R[rt] Base Address register Effective Address Offset MemRd Mem[R[rs] + SignExt[imm16]] ¬ R[rt] Data Memory Effective Address

33 Conditional Branch Example: Micro-Operation Sequence For BEQ
beq rs, rt, imm16 op rs rt immediate 16 21 26 31 6 bits 16 bits 5 bits 4 PC Offset in words [31:26] [25:21] [20:16] [15:0] Instruction Word ¬ Mem[PC] Fetch the instruction PC ¬ PC Increment PC Zero ¬ R[rs] - R[rt] Calculate the branch condition R[rs] == R[rt] (i.e R[rs] - R[rt] = 0 ) Zero : PC ¬ PC + ( SignExt(imm16) x 4 ) Calculate the next instruction’s PC address Condition Action Then Zero = 1 Branch Target “Zero” is zero flag of main ALU

34 Datapath For Branch Instructions
Main ALU evaluates branch condition New adder to compute branch target: Sum of incremented PC and the sign-extended lower 16-bits on the instruction. Datapath For Branch Instructions New 32-bit Adder (Third ALU) for Branch Target ( SignExt(imm16) x 4 PC ( SignExt(imm16) x 4 [25:21] rs [20:16] rt R[rs] R[rt] (Main ALU) Zero flag =1 if R[rs] - R[rt] = 0 (i.e R[rs] = R[rt]) Main ALU Evaluates Branch Condition (subtract) [15:0] imm16 SignExt(imm16)

35 More Detailed Datapath For Branch Operations
Clk busW RegWr 32 busA busB 5 Rw Ra Rb 32 32-bit Registers Rs Rt Equal? Zero imm16 PC 00 Adder Mux 4 PCSrc PC Ext Instruction Address Branch PC+4 Target 1 Main ALU (subtract) Branch Target ALU New 2X1 32-bit MUX to select next PC value Sign extend shift left 2 R[rs] R[rt] New Third ALU

36 Combining The Datapaths For Memory Instructions and R-Type Instructions
4 [25:21] rs R[rs] 32 [20:16] rt R[rt] 32 1 1 R[rt] 32 32 rt/rd MUX not shown [15:0] imm16 SignExt(imm16) 32 Highlighted muliplexors and connections added to combine the datapaths of memory and R-Type instructions into one datapath This is book version ORI not supported Figure 5.10 Page 299

37 Instruction Fetch Datapath Added to ALU R-Type and Memory Instructions Datapath
PC+ 4 32 32 PC rs 4 R[rs] rt R[rt] 1 1 32 rt/rd MUX not shown 32 32 This is book version ORI not supported, no zero extend of immediate needed

38 A Simple Datapath For The MIPS Architecture
Datapath of branches and a program counter multiplexor are added. Resulting datapath can execute in a single cycle the basic MIPS instruction: - load/store word ALU operations Branches 1 PC +4 Branch Target rt/rd MUX not shown 4 Branch Zero 32 rs R[rs] rt R[rt] This is book version ORI not supported, no zero extend of immediate needed Figure 5.11 page 300

39 Main ALU Control The main ALU has four control lines (detailed design in Appendix B) with the following functions: For our current subset of MIPS instructions only the top five functions will be used (thus only three control lines will be used) For R-type instruction the ALU function depends on both the opcode and the 6-bit “funct” function field For other instructions the ALU function depends on the opcode only. A local ALU control unit can be designed to accept 2-bit ALUop control lines (from main control unit) and the 6-bit function field and generate the correct 4-bit ALU control lines. ALU Control Lines ALU Function 0000 0001 0010 0110 0111 AND OR add subtract Set-on-less-than 1100 NOR Not Used

40 Local ALU Decoding of “func” Field
Main Control op 6 ALU (Local) func 2 ALUop ALUctr 4 Instruction Opcode LW SW Branch Equal R-Type ALUOp 00 01 10 Funct Field XXXXXX 100000 100010 100100 100101 101010 Desired ALU Action add subtract and or set on less than ALU Control Lines 0010 0110 0000 0001 0111 Operation Load word Store word branch equal AND OR

41 { Local ALU Control Unit (2 lines From main control unit) Function
Add = 00 Subtract = 01 Add Subtract AND OR Set-On-less-Than { R-type =10 Page 302 (2 lines From main control unit) Function Field 3 ALU Control Lines 4th line = 0 More details found in Appendix C (Book CD)

42 Single Cycle MIPS Datapath
Necessary multiplexors and control lines are identified here and local ALU control added: Function Field Branch Zero imm16 rs rt 32 PC +4 rd R[rs] R[rt] Target ALUOp (2-bits) 00 = add 01 = subtract 10 = R-Type This is book version ORI not supported, no zero extend of immediate needed Figure 5.15 page 305

43 Putting It All Together: A Single Cycle Datapath
Adr Inst Memory Instruction<31:0> <21:25> <16:20> <11:15> <0:15> ALUop (2-bits) 00 = add 01 = subtract 10 = R-Type Rs Rt Rd Imm16 ALU Control Branch Zero PCSrc RegDst MemWr MemtoReg Zero Function Field Rd Rt 1 Rs Rt 4 Adder PC+4 RegWr 5 5 5 Main ALU busA R[rs] Mux Rw Ra Rb = 00 busW 32 32 32-bit Registers R[rt] ALU 32 busB 32 PC Adder 32 Mux Mux 1 Clk 32 WrEn Adr 1 Clk 1 Data In PC Ext imm16 Extender Data Memory 32 Branch Target 16 imm16 Clk e.g Sign Extend + Shift Left 2 ExtOp ALUSrc MemRd (Includes ORI not in book version)

44 Control Unit DATA PATH Instruction<31:0> Control Lines RegDst
<21:25> <16:20> <11:15> <0:15> Imm16 Rd Rs Rt Adr Instruction Memory DATA PATH ALUSrc ALOp (2-bits) Mem Read MemtoReg Control Unit Op Fun Branch RegWrite <0:25> Jump_target Control Lines Write

45 The Effect of The Control Signals
Name RegDst RegWrite ALUSrc Branch MemRead MemWrite MemtoReg Effect when deasserted (=0) The register destination number for the write register comes from the rt field (instruction bits 20:16). None The second main ALU operand comes from the second register file output (Read data 2) R[rt] The PC is replaced by the output of the adder that computes PC + 4 The value fed to the register write data input comes from the main ALU. Effect when asserted (=1) The register destination number for the write register comes from the rd field (instruction bits 15:11). The register on the write register input is written with the value on the Write data input. The second main ALU operand is the sign-extended lower 16 bits on the instruction (imm16) If Zero =1 The PC is replaced by the output of the adder that computes the branch target. Data memory contents designated by the address input are put on the Read data output. address input are replaced by the value on the Write data input. The value fed to the register write data input comes from data memory.

46 Control Line Settings R-Format lw sw beq 1 X Figure 5.18 page 308
Instruction R-Format lw sw beq RegDst 1 X ALUSrc Memto- Reg Write Mem Read Branch ALUOp1 ALUOp0 ALUOp (2-bits) 00 = add 01 = subtract 10 = R-Type Figure 5.18 page 308

47 The Truth Table For The Main Control
(Opcode) Similar to Figure 5.22 Page 312

48 PLA Implementation of the Main Control
Figure C.2.5 (Appendix C) PLA = Programmable Logic Array (Appendix B)

49 Adding Support For Jump: Micro-Operation Sequence For Jump: J
j jump_target OP Jump_target 6 bits bits 2 Jump address in words [31:26] [25:0] Instruction Word ¬ Mem[PC] Fetch the instruction PC ¬ PC Increment PC PC ¬ PC(31-28),jump_target,00 Update PC with jump address PC(31-28) Jump Address jump target = 2500 4 bits bits bits 0 0 4 highest bits from PC + 4

50 Datapath For Jump PC(31-28),jump_target,00 Shift left 2 Branch Zero
Next Instruction Address 32 Adder 4 PCSrc JUMP PC+4 32 Mux 00 PC Clk Mux 32 Adder PC Instruction(15-0) 1 imm16 Branch Target 4 PC Ext e.g Sign Extend + Shift Left 2 Jump Address PC+4(31-28) PC(31-28),jump_target,00 Instruction(25-0) 26 Shift left 2 jump_target 28 32

51 Single Cycle MIPS Datapath Extended To Handle Jump with Control Unit Added
32 PC +4 PC +4 32 PC +4 32 32 32 Branch Target Book figure has an error! Opcode rs R[rs] rt R[rt] rd R[rt] imm16 ALUOp (2-bits) 00 = add 01 = subtract 10 = R-Type Figure 5.24 page 314 Function Field 32 This is book version ORI not supported, no zero extend of immediate needed

52 Control Line Settings (with jump instruction, j added)
R-Format lw sw beq j RegDst 1 X ALUSrc 1 X Memto- Reg 1 X Reg Write 1 Mem Read 1 Mem Write 1 Branch 1 X ALUOp1 1 X ALUOp0 1 X Jump 1 Figure 5.18 page 308 modified to include j

53 Worst Case Timing (Load)
Clk Clk-to-Q PC Old Value New Value Instruction Memoey Access Time Rs, Rt, Rd, Op, Func Old Value New Value Delay through Control Logic ALUctr Old Value New Value ExtOp Old Value New Value ALUSrc Old Value New Value MemtoReg Old Value New Value Register Write Occurs RegWr Old Value New Value Register File Access Time busA Old Value New Value Delay through Extender & Mux busB Old Value New Value ALU Delay Address Old Value New Value Data Memory Access Time busW Old Value New

54 Instruction Timing Comparison
Arithmetic & Logical PC Inst Memory Reg File ALU mux mux setup Load PC Inst Memory Reg File ALU Data Mem mux mux setup Critical Path Store PC Inst Memory Reg File ALU Data Mem mux Branch PC Inst Memory Reg File cmp mux Jump PC Inst Memory mux

55 Simplified Single Cycle Datapath Timing
Assuming the following datapath/control hardware components delays: Memory Units: 2 ns ALU and adders: 2 ns Register File: 1 ns Control Unit < 1 ns Ignoring Mux and clk-to-Q delays, critical path analysis: } Obtained from low-level target VLSI implementation technology of components Control Unit Instruction Memory Register Read Main ALU Data Memory Register Write Critical Path (Load) PC + 4 ALU Branch Target ALU Time ns ns ns ns ns ns ns = nanosecond = 10-9 second

56 Performance of Single-Cycle (CPI=1) CPU
Assuming the following datapath hardware components delays: Memory Units: 2 ns ALU and adders: 2 ns Register File: 1 ns The delays needed for each instruction type can be found : The clock cycle is determined by the instruction with longest delay: The load in this case which is 8 ns. Clock rate = 1 / 8 ns = 125 MHz A program with I = 1,000,000 instructions executed takes: Execution Time = T = I x CPI x C = x 1 x 8x = s = 8 msec Nanosecond, ns = second Instruction Instruction Register ALU Data Register Total Class Memory Read Operation Memory Write Delay ALU ns ns ns ns ns Load ns ns ns ns ns ns Store ns ns ns ns ns Branch ns ns ns ns Jump ns ns Load has longest delay of 8 ns thus determining the clock cycle of the CPU to be 8ns

57 Adding Support for jal to Single Cycle Datapath (For More Practice Exercise 5.20)
The MIPS jump and link instruction, jal is used to support procedure calls by jumping to jump address (similar to j ) and saving the address of the following instruction PC+4 in register $ra ($31) jal Address jal uses the j instruction format: We wish to add jal to the single cycle datapath in Figure 5.24 page Add any necessary datapaths and control signals to the single-clock datapath and justify the need for the modifications, if any. Specify control line values for this instruction. op (6 bits) Target address (26 bits)

58 Exercise 5.20: jump and link, jal support to Single Cycle Datapath
Instruction Word ¬ Mem[PC] R[31] ¬ PC + 4 PC ¬ Jump Address Jump Address PC + 4 PC + 4 Branch Target PC + 4 rs R[rs] rt R[rt] 31 2 2 rd imm16 (For More Practice Exercise 5.20)

59 Exercise 5.20: jump and link, jal support to Single Cycle Datapath
Adding Control Lines Settings for jal (For Textbook Single Cycle Datapath including Jump) RegDst Is now 2 bits MemtoReg Is now 2 bits Memto Reg Mem Mem RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUOp0 Jump R-format lw sw xx xx beq xx xx J xx x xx x x x JAL x x x x R[31] PC+ 4 PC ¬ Jump Address Instruction Word ¬ Mem[PC] R[31] ¬ PC + 4 PC ¬ Jump Address (For More Practice Exercise 5.20)

60 Adding Support for LWR to Single Cycle Datapath (For More Practice Exercise 5.22)
We wish to add a variant of lw (load word) let’s call it LWR to the single cycle datapath in Figure 5.24 page 314. LWR $rd, $rs, $rt The LWR instruction is similar to lw but it sums two registers (specified by $rs, $rt) to obtain the effective load address and uses the R-Type format Add any necessary datapaths and control signals to the single cycle datapath and justify the need for the modifications, if any. Specify control line values for this instruction. (For More Practice Exercise 5.22)

61 Exercise 5.22: LWR (R-format LW) support to Single Cycle Datapath
Instruction Word ¬ Mem[PC] PC ¬ PC + 4 R[rd] ¬ Mem[ R[rs] + R[rt] ] No new components or connections are needed for the datapath just the proper control line settings Adding Control Lines Settings for LWR (For Textbook Single Cycle Datapath including Jump) Memto Reg Mem Mem RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUOp0 Jump R-format lw sw x x beq x x J x x x x x x LWR Add rd R[rt] (For More Practice Exercise 5.22)

62 Adding Support for jm to Single Cycle Datapath (Based on “For More Practice Exercise 5.44” but for single cycle) We wish to add a new instruction jm (jump memory) to the single cycle datapath in Figure 5.24 page 314. jm offset($rs) The jm instruction loads a word from effective address (R[rs] + offset), this is similar to lw except the loaded word is put in the PC instead of register $rt. Jm used the I-format with field rt not used. Add any necessary datapaths and control signals to the single cycle datapath and justify the need for the modifications, if any. Specify control line values for this instruction. OP rs rt address (imm16) 6 bits bits bits bits Not Used

63 Adding jump memory, jm support to Single Cycle Datapath
Instruction Word ¬ Mem[PC] PC ¬ Mem[R[rs] + SignExt[imm16]] Jump 2 2 PC + 4 Jump 2 Branch Target rs R[rs] rt R[rt] rd imm16 (Based on “For More Practice Exercise 5.44” but for single cycle)

64 Adding jm support to Single Cycle Datapath
Adding Control Lines Settings for jm (For Textbook Single Cycle Datapath including Jump) Jump is now 2 bits Memto Reg Mem Mem RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUOp0 Jump R-format lw sw x x beq x x J x x x x x x Jm x x x add R[rs] PC ¬ Mem[R[rs] + SignExt[imm16]]

65 Drawbacks of Single Cycle Processor
Long cycle time: All instructions must take as much time as the slowest Here, cycle time for load is longer than needed for all other instructions. Cycle time must be long enough for the load instruction: PC’s Clock -to-Q + Instruction Memory Access Time + Register File Access Time + ALU Delay (address calculation) + Data Memory Access Time + Register File Setup Time + Clock Skew Real memory is not as well-behaved as idealized memory Cannot always complete data access in one (short) cycle. Impossible to implement complex, variable-length instructions and complex addressing modes in a single cycle. e.g indirect memory addressing. High and duplicate hardware resource requirements Any hardware functional unit cannot be used more than once in a single cycle (e.g. ALUs). Does not allow overlap of instruction processing (instruction pipelining, chapter 6).


Download ppt "CPU Organization (Design)"

Similar presentations


Ads by Google