Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee wut_cha/home.htm.

Similar presentations


Presentation on theme: "Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee wut_cha/home.htm."— Presentation transcript:

1 Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee http://dusithost.dusit.ac.th/~jutha wut_cha/home.htm

2 Outline  Building a CPU Basic Components MIPS Instructions (Microprocessor without Interlocked Pipeline Stages) Basic 5 Steps for CPU Single-Cycle Design Multi-cycle Design Comparison of Single and Multi-cycle Designs 2Introduction to Computer Organization and Architecture

3 Overview  Brief look Digital logic  CPU Datapath MIPS Example 3Introduction to Computer Organization and Architecture

4 Digital Logic DQ D-type Flip-flop Clock (edge- triggered) S (Select input) A B F 0 1 Multiplexer D-type Flip-flop with Enable Clock (edge- triggered) DQ EN 0 1 DQ D Q (enable) Clock (edge- triggered) 4Introduction to Computer Organization and Architecture

5 Digital Logic 1 Bit DQ Clock (edge- triggered) EN 4 Bits Clock (edge- triggered) D3Q3 EN D2Q2 D1Q1 D0Q0 Registers N Bits DQ Clock (edge- triggered) EN 5Introduction to Computer Organization and Architecture

6 Digital Logic out in drive Tri-state Driver (Buffer) InDriveOut 00Z 10Z 010 111 What is Z ?? 6Introduction to Computer Organization and Architecture

7 Digital Logic Adder/Subtractor or ALU A B F Carry-out Add/sub or ALUop Carry-in 7Introduction to Computer Organization and Architecture

8 Overview  Brief look Digital logic  How to Design a CPU Datapath MIPS Example 8Introduction to Computer Organization and Architecture

9 Designing a CPU: 5 Steps  Analyze the instruction set  datapath requirements MIPS: ADD, SUB, ORI, LW, SW, BR Meaning of each instruction given by RTL (register transfers) 2 types of registers: CPU/ISA registers, temporary registers  Datapath requirements  select the datapath components ALU, register file, adder, data memory, etc  Assemble the datapath Datapath must support planned register transfers Ensure all instructions are supported  Analyze datapath control required for each instruction  Assemble the control logic 9Introduction to Computer Organization and Architecture

10 Step 1a: Analyze ISA  All MIPS instructions are 32 bits long.  Three instruction formats: R-type I-type J-type  R: registers, I: immediate, J: jumps  These formats intentionally chosen to simplify design optarget address 02631 6 bits26 bits oprsrtrdshamtfunct 061116212631 6 bits 5 bits oprsrt immediate 016212631 6 bits16 bits5 bits 10Introduction to Computer Organization and Architecture

11 Step 1b: Analyze ISA  Meaning of the fields: op: operation of the instruction rs, rt, rd: the source and destination register specifiers  Destination is either rd (R-type), or rt (I-type) shamt: shift amount funct: selects the variant of the operation in the “op” field immediate: address offset or immediate value target address: target address of the jump instruction optarget address 02631 6 bits26 bits oprsrtrdshamtfunct 061116212631 6 bits 5 bits oprsrt immediate 016212631 6 bits16 bits5 bits R- type I-type J-type 11Introduction to Computer Organization and Architecture

12 MIPS ISA: subset for today  ADD and SUB addU rd, rs, rt subU rd, rs, rt  OR Immediate: ori rt, rs, imm16  LOAD and STORE Word lw rt, rs, imm16 sw rt, rs, imm16  BRANCH: beq rs, rt, imm16 oprsrtrdshamtfunct 061116212631 6 bits 5 bits oprsrtimmediate 016212631 6 bits16 bits5 bits oprsrtimmediate 016212631 6 bits16 bits5 bits oprsrtimmediate 016212631 6 bits16 bits5 bits 12Introduction to Computer Organization and Architecture

13 Step 2: Datapath Requirements REGISTER FILE  MIPS ISA requires 32 registers, 32b each Called a register file Contains 32 entries Each entry is 32b  AddU rd,rs,rt or SubU rd,rs,rt Read two sources rs, rt Operation rs + rt or rs – rt Write destination rd ← rs+/-rt  Requirements Read two registers (rs, rt) Perform ALU operation Write a third register (rd) RdReg1 RdReg2 WrReg WrData RdData1 RdData2 RegWrite REGFILE Register Numbers (5 bits ea) How to implement? ALU ALUop Result Zero? 13Introduction to Computer Organization and Architecture

14 Step 3: Datapath Assembly  ADDU rd, rs, rtSUBU rd, rs, rt Need an ALU  Hook it up to REGISTER FILE  REGFILE has 2 read ports (rs,rt), 1 write port (rd) rsParameters Come From Instruction Fields rt rd Control Signals Depend Upon Instruction Fields Eg: ALUop = f(Instruction) = f(op, funct) RdReg1 RdReg2 WrReg WrData RdData1 RdData2 RegWrite REGFILE ALU ALUop Result Zero? 14Introduction to Computer Organization and Architecture

15 Steps 2 and 3: ORI Instruction  ORI rt, rs, Imm16 Need new ALUop for ‘OR’ function, hook up to REGFILE 1 read port (rs), 1 write port (rt), 1 const value (Imm16) rs From Instruction rt rt rd X RdReg1 RdReg2 WrReg WrData RdData1 RdData2 RegWrite REGFILE ZERO- EXTEND ALU ALUop Result Zero? 16-bits Imm16 ALUsrc 0 1 Control Signals Depend Upon Instruction Fields E.g.: ALUsrc = f(Instruction) = f(op, funct) 15Introduction to Computer Organization and Architecture

16 Steps 2 and 3 Destination Register  Must select proper destination, rd or rt Depends on Instruction Type  R-type may write rd  I-type may write rt From Instruction RdReg1 RdReg2 WrReg WrData RdData1 RdData2 REGFILE rs rt rd ZERO- EXTEND ALU ALUop Result Zero? ALUsrc 0 1 RegDst 1 0 16-bits Imm16 RegWrite 16Introduction to Computer Organization and Architecture

17 Steps 2 and 3: Load Word  LW rt, rs, Imm16 Need Data Memory:data ← Mem[Addr]  Addr is rs+Imm16, Imm16 is signed, use ALU for + Store in rt:rt ← Mem[rs+Imm16] RdReg1 RdReg2 WrReg WrData RdData1 RdData2 REGFILE rs rt rd SIGN/ ZERO- EXTEND ALU ALUop Result Zero? ALUsrc 0 1 RegDst 1 0 Imm16 RegWrite Addr RdData MemtoReg 0 1 DATAMEM ExtOp 17 Introduction to Computer Organization and Architecture

18 Steps 2 and 3: Store Word  SW rt, rs, Imm16 Need Data Memory:Mem[Addr] ← data  Addr is rs+Imm16, Imm16 is signed, use ALU for + Store in Mem:Mem[rs+Imm16] ← rt RdReg1 RdReg2 WrReg WrData RdData1 RdData2 REGFILE rs rt rd SIGN/ ZERO- EXTEND ALU ALUop Result Zero? ALUsrc 0 1 RegDst 1 0 Imm16 RegWrite Addr RdData WrData MemtoReg 1 0 DATAMEM ExtOp MemWrite 18Introduction to Computer Organization and Architecture

19 Writes: Need to Control Timing  Problem: write to data memory Data can come anytime Addr must come first MemWrite must come after Addr  Else? writes to wrong Addr!  Solution: use ideal data memory Assume everything works ok How to fix this for real? One solution: synchronous memory Another solution: delay MemWr to come late  Problems?: write to register file Does RegWrite signal come after WrReg number? When does the write to a register happen? Read from same register as being written? 19Introduction to Computer Organization and Architecture

20 Missing Pieces: Instruction Fetching  Where does the Instruction come from? From instruction memory, of course! Recall: stored-program concept  Alternatives? How about hard-coding wires and switches…? This is how ENIAC was programmed! (Electronic Numerical Integrator and Computer)  How to branch? BEQ rs, rt, Imm16 20Introduction to Computer Organization and Architecture

21 Instruction Processing  Fetch instruction  Execute instruction  Fetch next instruction  Execute next instruction  Fetch next instruction  Execute next instruction  Etc…  How to maintain sequence? Use a counter!  Branches (out of sequence) ? Load the counter! 21Introduction to Computer Organization and Architecture

22 Instruction Processing  Program Counter Points to current instruction Address to instruction memory  Instr ← InstrMem[PC] Next instruction: counts up by 4  Remember: memory is byte-addressable, instructions are 4 bytes  PC ← PC + 4 Branch instruction: replace PC contents 22Introduction to Computer Organization and Architecture

23 Step 1: Analyze Instructions  Register Transfer Language … op | rs | rt | rd | shamt | funct = InstrMem[ PC ] op | rs | rt | Imm16 = InstrMem[ PC ] Instr Register Transfers ADDUR[rd] ← R[rs] + R[rt];PC ← PC + 4 SUBUR[rd] ← R[rs] – R[rt];PC ← PC + 4 ORIR[rt] ← R[rs] + zero_ext(Imm16); PC ← PC + 4 LOADR[rt] ← MEM[ R[rs] + sign_ext(Imm16)];PC ← PC + 4 STOREMEM[ R[rs] + sign_ext(Imm16) ] ← R[rt];PC ← PC + 4 BEQif ( R[rs] == R[rt] ) then PC ← PC + 4 + { sign_ext(Imm16)] || b’00’ } else PC ← PC + 4 23Introduction to Computer Organization and Architecture

24 Steps 2 and 3: Datapath & Assembly  PC: a register Counter, counts by +4 Provides address to Instruction Memory Add Read address Instruction Memory Instruction [31:0] PC Instruction[31:0] 4 24Introduction to Computer Organization and Architecture

25 Steps 2 and 3: Datapath & Assembly Add result Read address Instruction Memory Instruction [31:0] PC 0Mux10Mux1 Sign/ Zero Extend Instruction[25:21] Instruction[20:16] Instruction[15:11] Instruction[15:0] (Imm16) 16 32 PCSrc Shift Left 2 4 PC: a register  Counter, counts by +4  Sometimes, must add SignExtend{Imm16||b’00’} for branch instructions Note: the sign-extender for Imm16 is already in the datapath (everything else is new) ExtOp 25

26 Steps 2 and 3: Add Previous Datapath Add ALU Add result ALU result Zero Read address Instruction Memory Instruction [31:0] Register File Data Memory PC Addr- ess Read data Write data 0Mux10Mux1 1Mux01Mux0 0Mux10Mux1 0Mux10Mux1 ALU Control Sign/ Zero Extend Write reg. Read reg. 1 Read reg. 2 Read data 2 Read data 1 Write data Instruction[25:21] Instruction[20:16] Instruction[15:11] Instruction[15:0] (Imm16) Instruction[5:0] (funct) 16 32 RegWrite RegDst ALUSrc MemWrite PCSrc MemtoReg ALUOp Shift Left 2 4 ExtOp

27 What have we done?  Created a simple CPU datapath Control still missing (next slide)  Single-cycle CPU Every instruction takes 1 clock cycle Clocking ? 27Introduction to Computer Organization and Architecture

28 One Clock Cycle  Clock Locations PC, REGFILE have clocks  Operation On rising edge, PC will get new value  Maybe REGFILE will have one value updated as well After rising edge  PC and REGFILE can’t change  New value out of PC  Instruction out of INSTRMEM  Instruction selects registers to read from REGFILE  Instruction controls ALUop, ALUsrc, MemWrite, ExtOp, etc  ALU does its work  DataMem may be read (depending on instruction)  Result value goes back to REGFILE  New PC value goes back to PC  Await next clock edge Lots to do in only 1 clock cycle !! 28Introduction to Computer Organization and Architecture

29 Missing Steps?  Control is missing (Steps 4 and 5 we mentioned earlier) Generate the green signals  ALUsrc, MemWrite, MemtoReg, PCSrc, RegDst, etc These are all f(Instruction), where f() is a logic expression Will look at control strategies in upcoming lecture  Implementation Details How to implement REGFILE?  Read port: tristate buffers? Multiplexer? Memory?  Two read ports: two of above?  Write port: how to write only 1 register? How to control writes to memory? To register file?  More instructions Shift instructions Jump instruction Etc 29Introduction to Computer Organization and Architecture

30 1-Cycle CPU Datapath Add ALU Add result ALU result Zero Read address Instruction Memory Instruction [31:0] Register File Data Memory PC Addr- ess Read data Write data 0Mux10Mux1 1Mux01Mux0 0Mux10Mux1 0Mux10Mux1 ALU Control Sign/ Zero Extend Write reg. Read reg. 1 Read reg. 2 Read data 2 Read data 1 Write data Instruction[25:21] Instruction[20:16] Instruction[15:11] Instruction[15:0] (Imm16) Instruction[5:0] (funct) 16 32 RegWrite RegDst ALUSrc MemWrite PCSrc MemtoReg ALUOp Shift Left 2 4 ExtOp

31 1-cycle CPU Datapath + Control PCSrc Instruction[25:21] Instruction[20:16] Instruction[15:11] Instruction[15:0] Instruction[5:0] Instruction [31:26] Sign/ Zero Extend Data Memory Addr- ess Read data Write data ALU result Zero Read address Instruction Memory Instruction [31:0] Add PC 4 Add result Shift Left 2 Register File Write reg. Read reg. 1 Read reg. 2 Read data 2 Read data 1 Write data RegDst Branch MemRead MemtoReg ALUOp MemWrite ALUSrc RegWrite ALU control Con- trol

32 Input or Output Signal NameR-formatLwSwBeq Inputs Op50110 Op40000 Op30010 Op20001 Op10110 Op00110 Outputs RegDst10XX ALUSrc0110 MemtoReg01XX RegWrite1100 MemRead0100 MemWrite0010 Branch0001 ALUOp11000 ALUOp00001  Also: I-type instructions (ORI) & ExtOp (sign-extend control), etc. 1-cycle CPU Control – Lookup Table

33 1-cycle CPU + Jump Instruction Instruction [31:26] Instruction[25:0] PC + 4 [31..28] Jump address [31..0] Instruction[25:21] Instruction[20:16] Instruction[15:11] Instruction[15:0] Instruction[5:0]

34 1-cycle CPU Problems?  Every instruction 1 cycle  Some instructions “do more work” Eg, lw must read from DATAMEM  All instructions must have same clock period…  Many instructions run slower than necessary  Tricky timing on MemWrite, RegWrite(?) signals Write signal must come *after* address is stable  Need extra resources… PC+4 adder, ALU for BEQ instruction, DATAMEM+INSTRMEM 34Introduction to Computer Organization and Architecture

35 Performance!  Single-Cycle CPU Performance Execute one instruction per clock cycle (CPI=1) Clock cycle time? Note dataflow includes:  INSTRMEM read  REGFILE access  Sign extension  ALU operation  DATAMEM read  REGFILE/PC write Not every instruction uses all resources (eg, DATAMEM read) Can we change clock period for each instruction?  No! (Why not?) One clock period: the worst case! This is why a single-cycle CPU is not good for performance 35Introduction to Computer Organization and Architecture

36 1-cycle CPU Datapath + Controller Instruction [31:26] Instruction[25:0] PC + 4 [31..28] Jump address [31..0] Instruction[25:21] Instruction[20:16] Instruction[15:11] Instruction[15:0] Instruction[5:0]

37 1-cycle CPU Summary  Operation 1 cycle per instruction Control signals held fixed during entire cycle (except BRANCH) Only 2 registers  PC, updated every clock cycle  REGFILE, updated when required During clock cycle, data flows from register-outputs to register-inputs Fixed clock frequency / period  Performance 1 instruction per cycle Slowest instruction determines clock frequency  Outstanding issue: MemWrite timing Assume this signal writes to memory at end of clock cycle 37Introduction to Computer Organization and Architecture

38 Multi-cycle CPU Goals  Improve performance Break each instruction into smaller steps / multiple cycles  LW instruction  5 cycles  SW instruction  4 cycles  R-type instruction  4 cycles  Branch, Jump  3 cycles Aim for 5x clock frequency  Complex instructions (eg, LW)  5 cycles  same performance as before  Simple instructions (eg, ADD)  fewer cycles  faster  Save resources (gates/transistors) Re-use ALU over multiple cycles Put INSTR + DATA in same memory  MemWrite timing solved? 38Introduction to Computer Organization and Architecture

39 Multi-cycle CPU Datapath Instruction [25:21] Instruction [20:16] Instruction [15:11] Instruction [15:0] Instruction[5:0] Instr[15:0] Instruction Register Memory Data Register ALU Out A B Memory MemData Address Write data Registers RdData1 RdData2 RdReg2 RdReg1 Write reg Write data  Add multiplexers + control signals ( IorD, MemtoReg, ALUSrcA, ALUSrcB)  Move signal paths (+4, Shift Left 2) 4 Shift Left 2 Sign Extend PC M u x M u x ALU ALU result Zero M u x M u x M u x

40 Multi-cycle CPU Datapath Instruction [25:21] Instruction [20:16] Instruction [15:11] Instruction [15:0] Instruction[5:0] Instr[15:0] ALU Out A B Memory MemData Address Write data Registers RdData1 RdData2 RdReg2 RdReg1 Write reg Write data  Add registers + control signals (IR, MDR, A, B, ALUOut) Registers with no control signal load value every clock cycle (eg, PC) 4 Shift Left 2 Sign Extend PC M u x M u x ALU ALU result Zero M u x M u x M u x Instruction Register Memory Data Register

41 Instruction Execution Example  Execute a “Load Word” instruction LW rt, 0(rs)  5 Steps 1. Fetch instruction 2. Read registers 3. Compute address 4. Read data 5. Write registers 41Introduction to Computer Organization and Architecture

42 Load Word Instruction Sequence 1. Fetch Instruction InstructionRegister ← Mem[PC] Instruction [25:21] Instruction [20:16] Instruction [15:11] Instruction[5:0] Instr[15:0] ALU Out A B Write data Registers RdData1 RdData2 RdReg2 RdReg1 Write reg Write data 4 Shift Left 2 Sign Extend PC M u x M u x ALU ALU result Zero M u x M u x M u x Instruction Register Memory Data Register Instruction [15:0] Memory MemData Address

43 Load Word Instruction Sequence 2. Read Registers A ← Registers[Rs] Instruction [20:16] Instruction [15:11] Instruction [15:0] Instruction[5:0] Instr[15:0] ALU Out A B Memory MemData Address Write data Registers RdData2 RdReg2 Write reg Write data 4 Shift Left 2 Sign Extend PC M u x M u x ALU ALU result Zero M u x M u x M u x Instruction Register Memory Data Register Instruction [25:21] RdData1 RdReg1

44 Load Word Instruction Sequence 3. Compute Address ALUOut ← A + {SignExt(Imm16),b’00’} Instruction [25:21] Instruction [20:16] Instruction [15:0] Instruction[5:0] Instr[15:0] B Memory MemData Address Write data Registers RdData1 RdData2 RdReg2 RdReg1 Write reg Write data 4 Shift Left 2 Sign Extend PC M u x M u x ALU ALU result Zero M u x M u x M u x Instruction Register Memory Data Register Instruction [15:11] ALU Out A

45 Load Word Instruction Sequence 4. Read Data MDR ← Memory[ALUOut] Instruction [25:21] Instruction [20:16] Instruction [15:11] Instruction [15:0] Instruction[5:0] Instr[15:0] A B Write data Registers RdData1 RdData2 RdReg2 RdReg1 Write reg Write data 4 Shift Left 2 Sign Extend PC M u x M u x ALU ALU result Zero M u x M u x M u x Instruction Register Memory Data Register ALU Out Memory MemData Address

46 Load Word Instruction Sequence 5. Write Registers Registers[Rt] ← MDR Instruction [25:21] Instruction [20:16] Instruction [15:11] Instruction [15:0] Instruction[5:0] Instr[15:0] ALU Out A B Memory MemData Address Write data Registers RdData1 RdData2 RdReg2 RdReg1 4 Shift Left 2 Sign Extend PC M u x M u x ALU ALU result Zero M u x M u x M u x Instruction Register Memory Data Register Write reg Write data

47 Load Word Instruction Sequence All 5 Steps Shown Instruction[5:0] Instr[15:0] B Write data Registers RdData2 RdReg2 4 Shift Left 2 Sign Extend PC M u x M u x ALU ALU result Zero M u x M u x M u x Instruction Register Memory Data Register Instruction [25:21] Instruction [20:16] Instruction [15:11] Instruction [15:0] ALU Out Memory MemData Address RdData1 RdReg1 Write reg Write data A

48 Multi-cycle Load Word: Recap 1. Fetch Instruction InstructionRegister ← Mem[PC] 2. Read Registers A ← Registers[Rs] 3. Compute Address ALUOut ← A + {SignExt(Imm16)} 4. Read Data MDR ← Memory[ALUOut] 5. Write Registers Registers[Rt] ← MDR  Missing Steps? 48Introduction to Computer Organization and Architecture

49 Multi-cycle Load Word: Recap 1. Fetch Instruction InstructionRegister ← Mem[PC];PC ← PC + 4 2. Read Registers A ← Registers[Rs] 3. Compute Address ALUOut ← A + {SignExt(Imm16)} 4. Read Data MDR ← Memory[ALUOut] 5. Write Registers Registers[Rt] ← MDR  Missing Steps? Must increment the PC Do it as part of the instruction fetch (in step 1) Need PCWrite control signal 49Introduction to Computer Organization and Architecture

50 Multi-cycle R-Type Instruction 1. Fetch Instruction InstructionRegister ← Mem[PC];PC ← PC + 4 2. Read Registers A ← Registers[Rs];B ← Registers[Rt] 3. Compute Value ALUOut ← A op B 4. Write Registers Registers[Rd] ← ALUOut  RTL describes data flow action in each clock cycle Control signals determine precise data flow Each step implies unique control values 50Introduction to Computer Organization and Architecture

51 Multi-cycle R-Type Instruction: Control Signal Values 1. Fetch Instruction InstructionRegister ← Mem[PC];PC ← PC + 4 MemRead=1, ALUSrcA=0, IorD=0, IRWrite, ALUSrcB=01, ALUop=00, PCWrite, PCSource=00 2. Read Registers A ← Registers[Rs];B ← Registers[Rt] ALUSrcA=0, ALUSrcB=11, ALUop=00 3. Compute Value ALUOut ← A op B ALUSrcA=1, ALUSrcB=00, ALUop=10 4. Write Registers Registers[Rd] ← ALUOut RegDst=1, RegWrite, MemtoReg=0  Each step implies unique control values Fixed for entire cycle “Default value” implied if unspecified 51Introduction to Computer Organization and Architecture

52 Check Your Work – Is RTL Valid ? 1. Datapath check Within one cycle…  Each cycle has valid data flow path (path exists)  Each register gets only one new value Across multiple cycles…  Register value is defined before use in previous (earlier in time) clock cycle  Eg, “A  3” must occur before “B  A”  Make sure register value doesn’t disappear if set >1 cycle earlier 2. Control signal check Each cycle, RTL describing the datapath flow implies a value for each control signal  0 or 1 or default or don’t care Each control signal gets only one fixed value the entire cycle 3. Overall check Does the sequence of steps work ? 52Introduction to Computer Organization and Architecture

53 Multi-cycle BEQ Instruction 1. Fetch Instruction InstructionRegister ← Mem[PC];PC ← PC + 4 2. Read Registers, Precompute Target A ← Registers[Rs] ; B ← Registers[Rt] ; ALUOut ← PC + {SignExt{Imm16},b’00’} 3. Compare Registers, Conditional Branch if( (A – B) ==0 ) PC ← ALUOut Green shows PC calculation flow (in parallel with other operations) 53Introduction to Computer Organization and Architecture

54 Multi-cycle Datapath with Control Signals Instr[25:21] Instr[20:16] Instr[15:0] Instruction[5:0] In[15:11] Instr[25:0] PC[31..28] Jump address [31..0] PCWrite IorD MemRead MemWrite MemtoReg IRWrite PCSrc ALUOp ALUSrcA ALUSrcB RegWrite RegDst ALU Control 54Introduction to Computer Organization and Architecture

55 Multi-cycle Datapath with Controller Instr. [31:26] Instr[31:26] Instr[25:21] Instr[20:16] Instr[15:0] Instruction[5:0] In[15:11] Instr[25:0] PC[31..28] Jump address [31..0]

56 Multi-cycle BEQ Instruction 1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4 2. Read Registers, Precompute Target A ← Registers[Rs] ; B ← Registers[Rt] ; ALUOut ← PC + {SignExt{Imm16},b’00’} 3. Compare Registers, Conditional Branch if( (A – B) ==0 ) PC ← ALUOut Green shows PC calculation flow (in parallel with other operations) 56Introduction to Computer Organization and Architecture

57 Multi-cycle Datapath with Control Signals Instr[25:21] Instr[20:16] Instr[15:0] Instruction[5:0] In[15:11] Instr[25:0] PC[31..28] Jump address [31..0] PCWrite IorD MemRead MemWrite MemtoReg IRWrite PCSrc ALUOp ALUSrcA ALUSrcB RegWrite RegDst ALU Control 57Introduction to Computer Organization and Architecture

58 Multi-cycle Datapath with Controller Instr. [31:26] Instr[31:26] Instr[25:21] Instr[20:16] Instr[15:0] Instruction[5:0] In[15:11] Instr[25:0] PC[31..28] Jump address [31..0]

59 Multi-cycle CPU Control: Overview  General approach: Finite State Machine (FSM) Need details in each branch of control…  Precise outputs for each state (Mealy depends on inputs, Moore does not)  Precise “next state” for each state (can depend on inputs) Control Signal Outputs Control Signal Outputs 59Introduction to Computer Organization and Architecture

60 How to Implement FSM ?  Manually with logic gates + FFs Bubble diagram, next-state table, state assignment Karnaugh map for each state bit, each output bit (painful!)  High-level language description (eg, Verilog, VHDL) Describe FSM bubble diagram (next-states, output values) Automatically synthesized into gates + FFs  Microcode (µ-code) description Sequence through many µ-ops for each CPU instruction  One µ-op (µ-instruction) sends correct control signal for 1 cycle  µ-op similar to one bubble in FSM Acts like a mini-CPU within a CPU  µPC: microcode program counter  Microcode storage memory contains µ-ops Can look similar to RTL or some new “assembly language” 60Introduction to Computer Organization and Architecture

61 FSM Specification: Bubble Diagram Can build this by examining RTL It is possible to automatically convert RTL into this form ! 61

62 FSM: Gates + FFs Implementation FSM High-level Organization 62Introduction to Computer Organization and Architecture

63 FSM: Microcode Implementation Adder 1 Datapath control outputs Sequencing control Inputs from instruction register opcode field Microcode Storage (memory) Inputs Outputs Microprogram Counter Address Select Logic 63Introduction to Computer Organization and Architecture

64 Multi-cycle CPU with Control FSM Instr. [31:26] Instr[31:26] Instr[25:21] Instr[20:16] Instr[15:0] Instruction[5:0] In[15:11] Instr[25:0] PC[31..28] Jump address [31..0] FSM Control Outputs Conditional Branch

65 Control FSM: Overview  General approach: Finite State Machine (FSM)  Need details in each branch of control… 65Introduction to Computer Organization and Architecture

66 Detailed FSM 66

67 Detailed FSM 67

68 Detailed FSM: Instruction Fetch 68Introduction to Computer Organization and Architecture

69 Detailed FSM: Memory Reference LW SW 69

70 Detailed FSM: R-Type Instruction 70Introduction to Computer Organization and Architecture

71 Detailed FSM: Branch Instruction 71Introduction to Computer Organization and Architecture

72 Detailed FSM: Jump Instruction 72Introduction to Computer Organization and Architecture

73 Performance Comparison Single-cycle CPU vs Multi-cycle CPU 73Introduction to Computer Organization and Architecture

74 Simple Comparison Single-cycle CPU 1 clock cycle 5 clock cycles Multi-cycle CPU 4 clock cycles Multi-cycle CPU 3 clock cycles Multi-cycle CPU SW, R-type BEQ, J LW All

75 What’s really happening? Single-cycle CPU Multi-cycle CPU ( Load Word Instruction ) FetchDecodeMemoryWrite Calc Addr Ideally: 75Introduction to Computer Organization and Architecture

76 In practice, steps differ in speeds… Single-cycle CPU Multi-cycle CPU FetchDecodeMemory Calc Addr FetchDecodeMemory Calc Addr Write Violation! Wasted time! Load Word Instruction 76Introduction to Computer Organization and Architecture

77 Single-cycle vs Multi-cycle LW instruction faster for single-cycle Single-cycle CPU FetchDecodeMemory Calc Addr FetchDecodeMemory Calc Addr Write Violation fixed! Multi-cycle CPU Now wasted time is larger! 77Introduction to Computer Organization and Architecture

78 Single-cycle vs Multi-cycle SW instruction ~ same speed Single-cycle CPU FetchDecodeMemory Calc Addr FetchDecodeMemory Calc Addr Multi-cycle CPU Wasted time! Speed diff 78Introduction to Computer Organization and Architecture

79 Single-cycle vs Multi-cycle BEQ, J instruction faster for multi-cycle Single-cycle CPU FetchDecode Calc Addr FetchDecode Calc Addr Wasted time! Speed diff Multi-cycle CPU 79Introduction to Computer Organization and Architecture

80 Performance Summary  Which CPU implementation is faster? LW  single-cycle is faster SW,R-type  about the same BEQ,J  multi-cycle is faster  Real programs use a mix of these instructions  Overall performance depends instruction frequency ! 80Introduction to Computer Organization and Architecture

81 Implementation Summary  Single-cycle CPU 1 instruction per cycle (eg, 1MHz  1 MIPS) No “wasted time” on most complex instruction Large wasted time on simpler instructions Simple controller (just a lookup table or memory) Simple instructions  Multi-cycle CPU << 1 instruction per cycle (eg, 1MHz  0.2 MIPS) Small time wasted on most complex instruction  Hence, this instruction always slower than single-cycle CPU Small time wasted on simple instructions  Eliminates “large wasted time” by using fewer clock cycles Complex controller (FSM) Potential to create complex instructions 81Introduction to Computer Organization and Architecture

82 The End Lecture 11


Download ppt "Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee wut_cha/home.htm."

Similar presentations


Ads by Google