Presentation is loading. Please wait.

Presentation is loading. Please wait.

55:035 Computer Architecture and Organization Lecture 9.

Similar presentations


Presentation on theme: "55:035 Computer Architecture and Organization Lecture 9."— Presentation transcript:

1 55:035 Computer Architecture and Organization Lecture 9

2 Outline  Building a CPU Basic Components MIPS Instructions Basic 5 Steps for CPU Single-Cycle Design Multi-cycle Design Comparison of Single and Multi-cycle Designs 255:035 Computer Architecture and Organization

3 Overview  Brief look Digital logic  CPU Datapath MIPS Example 355:035 Computer Architecture and Organization

4 Digital Logic DQ D-type Flip-flop Clock (edge- triggered) S (Select input) A B F 0 1 Multiplexer D-type Flip-flop with Enable Clock (edge- triggered) DQ EN 0 1 DQ D Q (enable) Clock (edge- triggered) 455:035 Computer Architecture and Organization

5 Digital Logic 1 Bit DQ Clock (edge- triggered) EN 4 Bits Clock (edge- triggered) D3Q3 EN D2Q2 D1Q1 D0Q0 Registers N Bits DQ Clock (edge- triggered) EN 555:035 Computer Architecture and Organization

6 Digital Logic out in drive Tri-state Driver (Buffer) InDriveOut 00Z 10Z 010 111 What is Z ?? 655:035 Computer Architecture and Organization

7 Digital Logic Adder/Subtractor or ALU A B F Carry-out Add/sub or ALUop Carry-in 755:035 Computer Architecture and Organization

8 Overview  Brief look Digital logic  How to Design a CPU Datapath MIPS Example 855:035 Computer Architecture and Organization

9 Designing a CPU: 5 Steps  Analyze the instruction set  datapath requirements MIPS: ADD, SUB, ORI, LW, SW, BR Meaning of each instruction given by RTL (register transfers) 2 types of registers: CPU/ISA registers, temporary registers  Datapath requirements  select the datapath components ALU, register file, adder, data memory, etc  Assemble the datapath Datapath must support planned register transfers Ensure all instructions are supported  Analyze datapath control required for each instruction  Assemble the control logic 955:035 Computer Architecture and Organization

10 Step 1a: Analyze ISA  All MIPS instructions are 32 bits long.  Three instruction formats: R-type I-type J-type  R: registers, I: immediate, J: jumps  These formats intentionally chosen to simplify design optarget address 02631 6 bits26 bits oprsrtrdshamtfunct 061116212631 6 bits 5 bits oprsrt immediate 016212631 6 bits16 bits5 bits 1055:035 Computer Architecture and Organization

11 Step 1b: Analyze ISA  Meaning of the fields: op: operation of the instruction rs, rt, rd: the source and destination register specifiers  Destination is either rd (R-type), or rt (I-type) shamt: shift amount funct: selects the variant of the operation in the “op” field immediate: address offset or immediate value target address: target address of the jump instruction optarget address 02631 6 bits26 bits oprsrtrdshamtfunct 061116212631 6 bits 5 bits oprsrt immediate 016212631 6 bits16 bits5 bits R- type I-type J-type 1155:035 Computer Architecture and Organization

12 MIPS ISA: subset for today  ADD and SUB addU rd, rs, rt subU rd, rs, rt  OR Immediate: ori rt, rs, imm16  LOAD and STORE Word lw rt, rs, imm16 sw rt, rs, imm16  BRANCH: beq rs, rt, imm16 oprsrtrdshamtfunct 061116212631 6 bits 5 bits oprsrtimmediate 016212631 6 bits16 bits5 bits oprsrtimmediate 016212631 6 bits16 bits5 bits oprsrtimmediate 016212631 6 bits16 bits5 bits 1255:035 Computer Architecture and Organization

13 Step 2: Datapath Requirements REGISTER FILE  MIPS ISA requires 32 registers, 32b each Called a register file Contains 32 entries Each entry is 32b  AddU rd,rs,rt or SubU rd,rs,rt Read two sources rs, rt Operation rs + rt or rs – rt Write destination rd ← rs+/-rt  Requirements Read two registers (rs, rt) Perform ALU operation Write a third register (rd) RdReg1 RdReg2 WrReg WrData RdData1 RdData2 RegWrite REGFILE Register Numbers (5 bits ea) How to implement? ALU ALUop Result Zero? 1355:035 Computer Architecture and Organization

14 Step 3: Datapath Assembly  ADDU rd, rs, rtSUBU rd, rs, rt Need an ALU  Hook it up to REGISTER FILE  REGFILE has 2 read ports (rs,rt), 1 write port (rd) rsParameters Come From Instruction Fields rt rd Control Signals Depend Upon Instruction Fields Eg: ALUop = f(Instruction) = f(op, funct) RdReg1 RdReg2 WrReg WrData RdData1 RdData2 RegWrite REGFILE ALU ALUop Result Zero? 1455:035 Computer Architecture and Organization

15 Steps 2 and 3: ORI Instruction  ORI rt, rs, Imm16 Need new ALUop for ‘OR’ function, hook up to REGFILE 1 read port (rs), 1 write port (rt), 1 const value (Imm16) rs From Instruction rt rt rd X RdReg1 RdReg2 WrReg WrData RdData1 RdData2 RegWrite REGFILE ZERO- EXTEND ALU ALUop Result Zero? 16-bits Imm16 ALUsrc 0 1 Control Signals Depend Upon Instruction Fields E.g.: ALUsrc = f(Instruction) = f(op, funct) 1555:035 Computer Architecture and Organization

16 Steps 2 and 3 Destination Register  Must select proper destination, rd or rt Depends on Instruction Type  R-type may write rd  I-type may write rt From Instruction RdReg1 RdReg2 WrReg WrData RdData1 RdData2 REGFILE rs rt rd ZERO- EXTEND ALU ALUop Result Zero? ALUsrc 0 1 RegDst 1 0 16-bits Imm16 RegWrite 1655:035 Computer Architecture and Organization

17 Steps 2 and 3: Load Word  LW rt, rs, Imm16 Need Data Memory:data ← Mem[Addr]  Addr is rs+Imm16, Imm16 is signed, use ALU for + Store in rt:rt ← Mem[rs+Imm16] RdReg1 RdReg2 WrReg WrData RdData1 RdData2 REGFILE rs rt rd SIGN/ ZERO- EXTEND ALU ALUop Result Zero? ALUsrc 0 1 RegDst 1 0 Imm16 RegWrite Addr RdData MemtoReg 0 1 DATAMEM ExtOp 17 55:035 Computer Architecture and Organization

18 Steps 2 and 3: Store Word  SW rt, rs, Imm16 Need Data Memory:Mem[Addr] ← data  Addr is rs+Imm16, Imm16 is signed, use ALU for + Store in Mem:Mem[rs+Imm16] ← rt RdReg1 RdReg2 WrReg WrData RdData1 RdData2 REGFILE rs rt rd SIGN/ ZERO- EXTEND ALU ALUop Result Zero? ALUsrc 0 1 RegDst 1 0 Imm16 RegWrite Addr RdData WrData MemtoReg 1 0 DATAMEM ExtOp MemWrite 1855:035 Computer Architecture and Organization

19 Writes: Need to Control Timing  Problem: write to data memory Data can come anytime Addr must come first MemWrite must come after Addr  Else? writes to wrong Addr!  Solution: use ideal data memory Assume everything works ok How to fix this for real? One solution: synchronous memory Another solution: delay MemWr to come late  Problems?: write to register file Does RegWrite signal come after WrReg number? When does the write to a register happen? Read from same register as being written? 1955:035 Computer Architecture and Organization

20 Missing Pieces: Instruction Fetching  Where does the Instruction come from? From instruction memory, of course! Recall: stored-program concept  Alternatives? How about hard-coding wires and switches…? This is how ENIAC was programmed!  How to branch? BEQ rs, rt, Imm16 2055:035 Computer Architecture and Organization

21 Instruction Processing  Fetch instruction  Execute instruction  Fetch next instruction  Execute next instruction  Fetch next instruction  Execute next instruction  Etc…  How to maintain sequence? Use a counter!  Branches (out of sequence) ? Load the counter! 2155:035 Computer Architecture and Organization

22 Instruction Processing  Program Counter Points to current instruction Address to instruction memory  Instr ← InstrMem[PC] Next instruction: counts up by 4  Remember: memory is byte-addressable, instructions are 4 bytes  PC ← PC + 4 Branch instruction: replace PC contents 2255:035 Computer Architecture and Organization

23 Step 1: Analyze Instructions  Register Transfer Language … op | rs | rt | rd | shamt | funct = InstrMem[ PC ] op | rs | rt | Imm16 = InstrMem[ PC ] Instr Register Transfers ADDUR[rd] ← R[rs] + R[rt];PC ← PC + 4 SUBUR[rd] ← R[rs] – R[rt];PC ← PC + 4 ORIR[rt] ← R[rs] + zero_ext(Imm16); PC ← PC + 4 LOADR[rt] ← MEM[ R[rs] + sign_ext(Imm16)];PC ← PC + 4 STOREMEM[ R[rs] + sign_ext(Imm16) ] ← R[rt];PC ← PC + 4 BEQif ( R[rs] == R[rt] ) then PC ← PC + 4 + { sign_ext(Imm16)] || b’00’ } else PC ← PC + 4 2355:035 Computer Architecture and Organization

24 Steps 2 and 3: Datapath & Assembly  PC: a register Counter, counts by +4 Provides address to Instruction Memory Add Read address Instruction Memory Instruction [31:0] PC Instruction[31:0] 4 2455:035 Computer Architecture and Organization

25 Steps 2 and 3: Datapath & Assembly Add result Read address Instruction Memory Instruction [31:0] PC 0Mux10Mux1 Sign/ Zero Extend Instruction[25:21] Instruction[20:16] Instruction[15:11] Instruction[15:0] (Imm16) 16 32 PCSrc Shift Left 2 4 PC: a register  Counter, counts by +4  Sometimes, must add SignExtend{Imm16||b’00’} for branch instructions Note: the sign-extender for Imm16 is already in the datapath (everything else is new) ExtOp 25

26 Steps 2 and 3: Add Previous Datapath Add ALU Add result ALU result Zero Read address Instruction Memory Instruction [31:0] Register File Data Memory PC Addr- ess Read data Write data 0Mux10Mux1 1Mux01Mux0 0Mux10Mux1 0Mux10Mux1 ALU Control Sign/ Zero Extend Write reg. Read reg. 1 Read reg. 2 Read data 2 Read data 1 Write data Instruction[25:21] Instruction[20:16] Instruction[15:11] Instruction[15:0] (Imm16) Instruction[5:0] (funct) 16 32 RegWrite RegDst ALUSrc MemWrite PCSrc MemtoReg ALUOp Shift Left 2 4 ExtOp

27 What have we done?  Created a simple CPU datapath Control still missing (next slide)  Single-cycle CPU Every instruction takes 1 clock cycle Clocking ? 2755:035 Computer Architecture and Organization

28 One Clock Cycle  Clock Locations PC, REGFILE have clocks  Operation On rising edge, PC will get new value  Maybe REGFILE will have one value updated as well After rising edge  PC and REGFILE can’t change  New value out of PC  Instruction out of INSTRMEM  Instruction selects registers to read from REGFILE  Instruction controls ALUop, ALUsrc, MemWrite, ExtOp, etc  ALU does its work  DataMem may be read (depending on instruction)  Result value goes back to REGFILE  New PC value goes back to PC  Await next clock edge Lots to do in only 1 clock cycle !! 2855:035 Computer Architecture and Organization

29 Missing Steps?  Control is missing (Steps 4 and 5 we mentioned earlier) Generate the green signals  ALUsrc, MemWrite, MemtoReg, PCSrc, RegDst, etc These are all f(Instruction), where f() is a logic expression Will look at control strategies in upcoming lecture  Implementation Details How to implement REGFILE?  Read port: tristate buffers? Multiplexer? Memory?  Two read ports: two of above?  Write port: how to write only 1 register? How to control writes to memory? To register file?  More instructions Shift instructions Jump instruction Etc 2955:035 Computer Architecture and Organization

30 1-Cycle CPU Datapath Add ALU Add result ALU result Zero Read address Instruction Memory Instruction [31:0] Register File Data Memory PC Addr- ess Read data Write data 0Mux10Mux1 1Mux01Mux0 0Mux10Mux1 0Mux10Mux1 ALU Control Sign/ Zero Extend Write reg. Read reg. 1 Read reg. 2 Read data 2 Read data 1 Write data Instruction[25:21] Instruction[20:16] Instruction[15:11] Instruction[15:0] (Imm16) Instruction[5:0] (funct) 16 32 RegWrite RegDst ALUSrc MemWrite PCSrc MemtoReg ALUOp Shift Left 2 4 ExtOp

31 1-cycle CPU Datapath + Control PCSrc Instruction[25:21] Instruction[20:16] Instruction[15:11] Instruction[15:0] Instruction[5:0] Instruction [31:26] Sign/ Zero Extend Data Memory Addr- ess Read data Write data ALU result Zero Read address Instruction Memory Instruction [31:0] Add PC 4 Add result Shift Left 2 Register File Write reg. Read reg. 1 Read reg. 2 Read data 2 Read data 1 Write data RegDst Branch MemRead MemtoReg ALUOp MemWrite ALUSrc RegWrite ALU control Con- trol

32 Input or Output Signal NameR-formatLwSwBeq Inputs Op50110 Op40000 Op30010 Op20001 Op10110 Op00110 Outputs RegDst10XX ALUSrc0110 MemtoReg01XX RegWrite1100 MemRead0100 MemWrite0010 Branch0001 ALUOp11000 ALUOp00001  Also: I-type instructions (ORI) & ExtOp (sign-extend control), etc. 1-cycle CPU Control – Lookup Table

33 1-cycle CPU + Jump Instruction Instruction [31:26] Instruction[25:0] PC + 4 [31..28] Jump address [31..0] Instruction[25:21] Instruction[20:16] Instruction[15:11] Instruction[15:0] Instruction[5:0]

34 1-cycle CPU Problems?  Every instruction 1 cycle  Some instructions “do more work” Eg, lw must read from DATAMEM  All instructions must have same clock period…  Many instructions run slower than necessary  Tricky timing on MemWrite, RegWrite(?) signals Write signal must come *after* address is stable  Need extra resources… PC+4 adder, ALU for BEQ instruction, DATAMEM+INSTRMEM 3455:035 Computer Architecture and Organization

35 Performance!  Single-Cycle CPU Performance Execute one instruction per clock cycle (CPI=1) Clock cycle time? Note dataflow includes:  INSTRMEM read  REGFILE access  Sign extension  ALU operation  DATAMEM read  REGFILE/PC write Not every instruction uses all resources (eg, DATAMEM read) Can we change clock period for each instruction?  No! (Why not?) One clock period: the worst case! This is why a single-cycle CPU is not good for performance 3555:035 Computer Architecture and Organization

36 1-cycle CPU Datapath + Controller Instruction [31:26] Instruction[25:0] PC + 4 [31..28] Jump address [31..0] Instruction[25:21] Instruction[20:16] Instruction[15:11] Instruction[15:0] Instruction[5:0]

37 1-cycle CPU Summary  Operation 1 cycle per instruction Control signals held fixed during entire cycle (except BRANCH) Only 2 registers  PC, updated every clock cycle  REGFILE, updated when required During clock cycle, data flows from register-outputs to register-inputs Fixed clock frequency / period  Performance 1 instruction per cycle Slowest instruction determines clock frequency  Outstanding issue: MemWrite timing Assume this signal writes to memory at end of clock cycle 3755:035 Computer Architecture and Organization

38 Multi-cycle CPU Goals  Improve performance Break each instruction into smaller steps / multiple cycles  LW instruction  5 cycles  SW instruction  4 cycles  R-type instruction  4 cycles  Branch, Jump  3 cycles Aim for 5x clock frequency  Complex instructions (eg, LW)  5 cycles  same performance as before  Simple instructions (eg, ADD)  fewer cycles  faster  Save resources (gates/transistors) Re-use ALU over multiple cycles Put INSTR + DATA in same memory  MemWrite timing solved? 3855:035 Computer Architecture and Organization

39 Multi-cycle CPU Datapath Instruction [25:21] Instruction [20:16] Instruction [15:11] Instruction [15:0] Instruction[5:0] Instr[15:0] Instruction Register Memory Data Register ALU Out A B Memory MemData Address Write data Registers RdData1 RdData2 RdReg2 RdReg1 Write reg Write data  Add multiplexers + control signals ( IorD, MemtoReg, ALUSrcA, ALUSrcB)  Move signal paths (+4, Shift Left 2) 4 Shift Left 2 Sign Extend PC M u x M u x ALU ALU result Zero M u x M u x M u x

40 Multi-cycle CPU Datapath Instruction [25:21] Instruction [20:16] Instruction [15:11] Instruction [15:0] Instruction[5:0] Instr[15:0] ALU Out A B Memory MemData Address Write data Registers RdData1 RdData2 RdReg2 RdReg1 Write reg Write data  Add registers + control signals (IR, MDR, A, B, ALUOut) Registers with no control signal load value every clock cycle (eg, PC) 4 Shift Left 2 Sign Extend PC M u x M u x ALU ALU result Zero M u x M u x M u x Instruction Register Memory Data Register

41 Instruction Execution Example  Execute a “Load Word” instruction LW rt, 0(rs)  5 Steps 1. Fetch instruction 2. Read registers 3. Compute address 4. Read data 5. Write registers 4155:035 Computer Architecture and Organization

42 Load Word Instruction Sequence 1. Fetch Instruction InstructionRegister ← Mem[PC] Instruction [25:21] Instruction [20:16] Instruction [15:11] Instruction[5:0] Instr[15:0] ALU Out A B Write data Registers RdData1 RdData2 RdReg2 RdReg1 Write reg Write data 4 Shift Left 2 Sign Extend PC M u x M u x ALU ALU result Zero M u x M u x M u x Instruction Register Memory Data Register Instruction [15:0] Memory MemData Address

43 Load Word Instruction Sequence 2. Read Registers A ← Registers[Rs] Instruction [20:16] Instruction [15:11] Instruction [15:0] Instruction[5:0] Instr[15:0] ALU Out A B Memory MemData Address Write data Registers RdData2 RdReg2 Write reg Write data 4 Shift Left 2 Sign Extend PC M u x M u x ALU ALU result Zero M u x M u x M u x Instruction Register Memory Data Register Instruction [25:21] RdData1 RdReg1

44 Load Word Instruction Sequence 3. Compute Address ALUOut ← A + {SignExt(Imm16),b’00’} Instruction [25:21] Instruction [20:16] Instruction [15:0] Instruction[5:0] Instr[15:0] B Memory MemData Address Write data Registers RdData1 RdData2 RdReg2 RdReg1 Write reg Write data 4 Shift Left 2 Sign Extend PC M u x M u x ALU ALU result Zero M u x M u x M u x Instruction Register Memory Data Register Instruction [15:11] ALU Out A

45 Load Word Instruction Sequence 4. Read Data MDR ← Memory[ALUOut] Instruction [25:21] Instruction [20:16] Instruction [15:11] Instruction [15:0] Instruction[5:0] Instr[15:0] A B Write data Registers RdData1 RdData2 RdReg2 RdReg1 Write reg Write data 4 Shift Left 2 Sign Extend PC M u x M u x ALU ALU result Zero M u x M u x M u x Instruction Register Memory Data Register ALU Out Memory MemData Address

46 Load Word Instruction Sequence 5. Write Registers Registers[Rt] ← MDR Instruction [25:21] Instruction [20:16] Instruction [15:11] Instruction [15:0] Instruction[5:0] Instr[15:0] ALU Out A B Memory MemData Address Write data Registers RdData1 RdData2 RdReg2 RdReg1 4 Shift Left 2 Sign Extend PC M u x M u x ALU ALU result Zero M u x M u x M u x Instruction Register Memory Data Register Write reg Write data

47 Load Word Instruction Sequence All 5 Steps Shown Instruction[5:0] Instr[15:0] B Write data Registers RdData2 RdReg2 4 Shift Left 2 Sign Extend PC M u x M u x ALU ALU result Zero M u x M u x M u x Instruction Register Memory Data Register Instruction [25:21] Instruction [20:16] Instruction [15:11] Instruction [15:0] ALU Out Memory MemData Address RdData1 RdReg1 Write reg Write data A

48 Multi-cycle Load Word: Recap 1. Fetch Instruction InstructionRegister ← Mem[PC] 2. Read Registers A ← Registers[Rs] 3. Compute Address ALUOut ← A + {SignExt(Imm16)} 4. Read Data MDR ← Memory[ALUOut] 5. Write Registers Registers[Rt] ← MDR  Missing Steps? 4855:035 Computer Architecture and Organization

49 Multi-cycle Load Word: Recap 1. Fetch Instruction InstructionRegister ← Mem[PC];PC ← PC + 4 2. Read Registers A ← Registers[Rs] 3. Compute Address ALUOut ← A + {SignExt(Imm16)} 4. Read Data MDR ← Memory[ALUOut] 5. Write Registers Registers[Rt] ← MDR  Missing Steps? Must increment the PC Do it as part of the instruction fetch (in step 1) Need PCWrite control signal 4955:035 Computer Architecture and Organization

50 Multi-cycle R-Type Instruction 1. Fetch Instruction InstructionRegister ← Mem[PC];PC ← PC + 4 2. Read Registers A ← Registers[Rs];B ← Registers[Rt] 3. Compute Value ALUOut ← A op B 4. Write Registers Registers[Rd] ← ALUOut  RTL describes data flow action in each clock cycle Control signals determine precise data flow Each step implies unique control values 5055:035 Computer Architecture and Organization

51 Multi-cycle R-Type Instruction: Control Signal Values 1. Fetch Instruction InstructionRegister ← Mem[PC];PC ← PC + 4 MemRead=1, ALUSrcA=0, IorD=0, IRWrite, ALUSrcB=01, ALUop=00, PCWrite, PCSource=00 2. Read Registers A ← Registers[Rs];B ← Registers[Rt] ALUSrcA=0, ALUSrcB=11, ALUop=00 3. Compute Value ALUOut ← A op B ALUSrcA=1, ALUSrcB=00, ALUop=10 4. Write Registers Registers[Rd] ← ALUOut RegDst=1, RegWrite, MemtoReg=0  Each step implies unique control values Fixed for entire cycle “Default value” implied if unspecified 5155:035 Computer Architecture and Organization

52 Check Your Work – Is RTL Valid ? 1. Datapath check Within one cycle…  Each cycle has valid data flow path (path exists)  Each register gets only one new value Across multiple cycles…  Register value is defined before use in previous (earlier in time) clock cycle  Eg, “A  3” must occur before “B  A”  Make sure register value doesn’t disappear if set >1 cycle earlier 2. Control signal check Each cycle, RTL describing the datapath flow implies a value for each control signal  0 or 1 or default or don’t care Each control signal gets only one fixed value the entire cycle 3. Overall check Does the sequence of steps work ? 5255:035 Computer Architecture and Organization

53 Multi-cycle BEQ Instruction 1. Fetch Instruction InstructionRegister ← Mem[PC];PC ← PC + 4 2. Read Registers, Precompute Target A ← Registers[Rs] ; B ← Registers[Rt] ; ALUOut ← PC + {SignExt{Imm16},b’00’} 3. Compare Registers, Conditional Branch if( (A – B) ==0 ) PC ← ALUOut Green shows PC calculation flow (in parallel with other operations) 5355:035 Computer Architecture and Organization

54 Multi-cycle Datapath with Control Signals Instr[25:21] Instr[20:16] Instr[15:0] Instruction[5:0] In[15:11] Instr[25:0] PC[31..28] Jump address [31..0] PCWrite IorD MemRead MemWrite MemtoReg IRWrite PCSrc ALUOp ALUSrcA ALUSrcB RegWrite RegDst ALU Control 5455:035 Computer Architecture and Organization

55 Multi-cycle Datapath with Controller Instr. [31:26] Instr[31:26] Instr[25:21] Instr[20:16] Instr[15:0] Instruction[5:0] In[15:11] Instr[25:0] PC[31..28] Jump address [31..0]

56 Multi-cycle BEQ Instruction 1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4 2. Read Registers, Precompute Target A ← Registers[Rs] ; B ← Registers[Rt] ; ALUOut ← PC + {SignExt{Imm16},b’00’} 3. Compare Registers, Conditional Branch if( (A – B) ==0 ) PC ← ALUOut Green shows PC calculation flow (in parallel with other operations) 5655:035 Computer Architecture and Organization

57 Multi-cycle Datapath with Control Signals Instr[25:21] Instr[20:16] Instr[15:0] Instruction[5:0] In[15:11] Instr[25:0] PC[31..28] Jump address [31..0] PCWrite IorD MemRead MemWrite MemtoReg IRWrite PCSrc ALUOp ALUSrcA ALUSrcB RegWrite RegDst ALU Control 5755:035 Computer Architecture and Organization

58 Multi-cycle Datapath with Controller Instr. [31:26] Instr[31:26] Instr[25:21] Instr[20:16] Instr[15:0] Instruction[5:0] In[15:11] Instr[25:0] PC[31..28] Jump address [31..0]

59 Multi-cycle CPU Control: Overview  General approach: Finite State Machine (FSM) Need details in each branch of control…  Precise outputs for each state (Mealy depends on inputs, Moore does not)  Precise “next state” for each state (can depend on inputs) Control Signal Outputs Control Signal Outputs 5955:035 Computer Architecture and Organization

60 How to Implement FSM ?  Manually with logic gates + FFs Bubble diagram, next-state table, state assignment Karnaugh map for each state bit, each output bit (painful!)  High-level language description (eg, Verilog, VHDL) Describe FSM bubble diagram (next-states, output values) Automatically synthesized into gates + FFs  Microcode (µ-code) description Sequence through many µ-ops for each CPU instruction  One µ-op (µ-instruction) sends correct control signal for 1 cycle  µ-op similar to one bubble in FSM Acts like a mini-CPU within a CPU  µPC: microcode program counter  Microcode storage memory contains µ-ops Can look similar to RTL or some new “assembly language” 6055:035 Computer Architecture and Organization

61 FSM Specification: Bubble Diagram Can build this by examining RTL It is possible to automatically convert RTL into this form ! 61

62 FSM: Gates + FFs Implementation FSM High-level Organization 6255:035 Computer Architecture and Organization

63 FSM: Microcode Implementation Adder 1 Datapath control outputs Sequencing control Inputs from instruction register opcode field Microcode Storage (memory) Inputs Outputs Microprogram Counter Address Select Logic 6355:035 Computer Architecture and Organization

64 Multi-cycle CPU with Control FSM Instr. [31:26] Instr[31:26] Instr[25:21] Instr[20:16] Instr[15:0] Instruction[5:0] In[15:11] Instr[25:0] PC[31..28] Jump address [31..0] FSM Control Outputs Conditional Branch

65 Control FSM: Overview  General approach: Finite State Machine (FSM)  Need details in each branch of control… 6555:035 Computer Architecture and Organization

66 Detailed FSM 66

67 Detailed FSM 67

68 Detailed FSM: Instruction Fetch 6855:035 Computer Architecture and Organization

69 Detailed FSM: Memory Reference LW SW 69

70 Detailed FSM: R-Type Instruction 7055:035 Computer Architecture and Organization

71 Detailed FSM: Branch Instruction 7155:035 Computer Architecture and Organization

72 Detailed FSM: Jump Instruction 7255:035 Computer Architecture and Organization

73 Performance Comparison Single-cycle CPU vs Multi-cycle CPU 7355:035 Computer Architecture and Organization

74 Simple Comparison Single-cycle CPU 1 clock cycle 5 clock cycles Multi-cycle CPU 4 clock cycles Multi-cycle CPU 3 clock cycles Multi-cycle CPU SW, R-type BEQ, J LW All

75 What’s really happening? Single-cycle CPU Multi-cycle CPU ( Load Word Instruction ) FetchDecodeMemoryWrite Calc Addr Ideally: 7555:035 Computer Architecture and Organization

76 In practice, steps differ in speeds… Single-cycle CPU Multi-cycle CPU FetchDecodeMemory Calc Addr FetchDecodeMemory Calc Addr Write Violation! Wasted time! Load Word Instruction 7655:035 Computer Architecture and Organization

77 Single-cycle vs Multi-cycle LW instruction faster for single-cycle Single-cycle CPU FetchDecodeMemory Calc Addr FetchDecodeMemory Calc Addr Write Violation fixed! Multi-cycle CPU Now wasted time is larger! 7755:035 Computer Architecture and Organization

78 Single-cycle vs Multi-cycle SW instruction ~ same speed Single-cycle CPU FetchDecodeMemory Calc Addr FetchDecodeMemory Calc Addr Multi-cycle CPU Wasted time! Speed diff 7855:035 Computer Architecture and Organization

79 Single-cycle vs Multi-cycle BEQ, J instruction faster for multi-cycle Single-cycle CPU FetchDecode Calc Addr FetchDecode Calc Addr Wasted time! Speed diff Multi-cycle CPU 7955:035 Computer Architecture and Organization

80 Performance Summary  Which CPU implementation is faster? LW  single-cycle is faster SW,R-type  about the same BEQ,J  multi-cycle is faster  Real programs use a mix of these instructions  Overall performance depends instruction frequency ! 8055:035 Computer Architecture and Organization

81 Implementation Summary  Single-cycle CPU 1 instruction per cycle (eg, 1MHz  1 MIPS) No “wasted time” on most complex instruction Large wasted time on simpler instructions Simple controller (just a lookup table or memory) Simple instructions  Multi-cycle CPU << 1 instruction per cycle (eg, 1MHz  0.2 MIPS) Small time wasted on most complex instruction  Hence, this instruction always slower than single-cycle CPU Small time wasted on simple instructions  Eliminates “large wasted time” by using fewer clock cycles Complex controller (FSM) Potential to create complex instructions 8155:035 Computer Architecture and Organization


Download ppt "55:035 Computer Architecture and Organization Lecture 9."

Similar presentations


Ads by Google