Presentation is loading. Please wait.

Presentation is loading. Please wait.

Embedded Systems in Silicon TD5102 MIPS design Datapath and Control Henk Corporaal Technical University.

Similar presentations

Presentation on theme: "Embedded Systems in Silicon TD5102 MIPS design Datapath and Control Henk Corporaal Technical University."— Presentation transcript:

1 Embedded Systems in Silicon TD5102 MIPS design Datapath and Control Henk Corporaal Technical University Eindhoven DTI / NUS Singapore 2005/2006

2 HC TD51022 Topics n Building a datapath u support a subset of the MIPS-I instruction-set n A single cycle processor datapath u all instruction actions in one (long) cycle n A multi-cycle processor datapath u each instructions takes multiple (shorter) cycles n Exception support

3 HC TD51023 Datapath and Control DatapathControl Registers & Memories Multiplexors Buses ALUs FSM or Micro- programming

4 HC TD51024 n Simplified MIPS implementation to contain only:  memory-reference instructions: lw, sw  arithmetic-logical instructions: add, sub, and, or, slt  control flow instructions: beq, j n Generic Implementation: u use the program counter (PC) to supply instruction address u get the instruction from memory u read registers u use the instruction to decide exactly what to do n All instructions use the ALU after reading the registers Why? F memory-reference? F arithmetic? F control flow? The Processor: Datapath & Control

5 HC TD51025 n Abstract / Simplified View: n Two types of functional units: u elements that operate on data values (combinational) u elements that contain state (sequential) More Implementation Details

6 HC TD51026 n Unclocked vs. Clocked n Clocks used in synchronous logic u when should an element that contains state be updated? cycle time rising edge falling edge State Elements

7 HC TD51027 n The set-reset (SR) latch u output depends on present inputs and also on past inputs An unclocked state element R S Q Q Truth table: R S Q 0 0 Q 0 1 1 1 0 0 1 1 ? state change

8 HC TD51028 n Output is equal to the stored value inside the element (don't need to ask for permission to look at the value) n Change of state (value) is based on the clock u Latches: whenever the inputs change, and the clock is asserted u Flip-flop: state changes only on a clock edge (edge-triggered methodology) A clocking methodology defines when signals can be read and written — wouldn't want to read a signal at the same time it was being written Latches and Flip-flops

9 HC TD51029 n Two inputs: u the data value to be stored (D) u the clock signal (C) indicating when to read & store D n Two outputs: u the value of the internal state (Q) and it's complement D-latch

10 HC TD510210 D flip-flop n Output changes only on the clock edge QQ _ Q Q _ Q D latch D C D latch DD C C

11 HC TD510211 Our Implementation n An edge triggered methodology n Typical execution: u read contents of some state elements, u send values through some combinational logic, u write results to one or more state elements Clock cycle State element 1 Combinational logic State element 2

12 HC TD510212 n 3-ported: one write, two read ports Register File Read reg. #1 Read reg.#2 Write reg.# Read data 1 Read data 2 Write data

13 HC TD510213 Register file: read ports Register file built using D flip-flops

14 HC TD510214 Register file: write port n Note: we still use the real clock to determine when to write

15 HC TD510215 Simple Implementation n Include the functional units we need for each instruction Why do we need this stuff? ALU control RegWrite Registers Write register Read data 1 Read data 2 Read register 1 Read register 2 Write data ALU result ALU Data Data Register numbers a. Registersb. ALU Zero 5 5 5 3

16 HC TD510216 Building the Datapath n Use multiplexors to stitch them together PC Instruction memory Read address Instruction 16 32 Add ALU result M u x Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 Shift left 2 4 M u x ALU operation 3 RegWrite MemRead MemWrite PCSrc ALUSrc MemtoReg ALU result Zero ALU Data memory Address Write data Read data M u x Sign extend Add

17 HC TD510217 n All of the logic is combinational n We wait for everything to settle down, and the right thing to be done u ALU might not produce “right answer” right away u we use write signals along with clock to determine when to write n Cycle time determined by length of the longest path Our Simple Control Structure We are ignoring some details like setup and hold times ! Clock cycle State element 1 Combinational logic State element 2

18 HC TD510218 Control n Selecting the operations to perform (ALU, read/write, etc.) n Controlling the flow of data (multiplexor inputs) n Information comes from the 32 bits of the instruction Example: add $8, $17, $18 Instruction Format: 000000 10001 10010 01000 00000 100000 op rs rt rd shamt funct n ALU's operation based on instruction type and function code

19 HC TD510219 Control: 2 level implementation instruction register ALUop ALUcontrol Opcode Funct. 31 26 0 5 bit Control 1 Control 2 ALU 00: lw, sw 01: beq 10: add, sub, and, or, slt 000: and 001: or 010: add 110: sub 111: set on less than 6 6 2 3

20 HC TD510220 Datapath with Control

21 HC TD510221 What should the ALU do with this instruction example: lw $1, 100($2) 35 21 100 op rsrt 16 bit offset ALU control input 000 AND 001OR 010add 110subtract 111set-on-less-than n Why is the code for subtract 110 and not 011? ALU Control1

22 HC TD510222 n Must describe hardware to compute 3-bit ALU control input u given instruction type 00 = lw, sw 01 = beq, 10 = arithmetic u function code for arithmetic n Describe it using a truth table (can turn into gates): ALU Operation class, computed from instruction type ALU Control1 outputs intputs

23 HC TD510223 ALU Control1 n Simple combinational logic (truth tables)

24 HC TD510224 Deriving Control2 signals 9 control (output) signals Determine these control signals directly from the opcodes: R-format: 0 lw: 35 sw: 43 beq: 4 Input 6-bits

25 HC TD510225 Control 2 n PLA example implementation

26 HC TD510226 Single Cycle Implementation n Calculate cycle time assuming negligible delays except: u memory (2ns), ALU and adders (2ns), register file access (1ns)

27 HC TD510227 Single Cycle Implementation n Memory (2ns), ALU & adders (2ns), reg. file access (1ns) n Fixed length clock: longest instruction is the ‘lw’ which requires 8 ns n Variable clock length (not realistic, just as exercise): u R-instr: 6 ns u Load: 8 ns u Store: 7 ns u Branch: 5 ns u Jump: 2 ns n Average depends on instruction mix (see pg 374)

28 HC TD510228 Where we are headed n Single Cycle Problems: u what if we had a more complicated instruction like floating point? u wasteful of area: NO Sharing of Hardware resources n One Solution: u use a “smaller” cycle time u have different instructions take different numbers of cycles u a “multicycle” datapath: IR MDR

29 HC TD510229 n We will be reusing functional units u ALU used to compute address and to increment PC u Memory used for instruction and data n Add registers after every major functional unit n Our control signals will not be determined solely by instruction u e.g., what should the ALU do for a “subtract” instruction? n We’ll use a finite state machine (FSM) or microcode for control Multicycle Approach

30 HC TD510230 n Finite state machines: u a set of states and u next state function (determined by current state and the input) u output function (determined by current state and possibly input) u We’ll use a Moore machine (output based only on current state) Review: finite state machines Next-state function Current state Clock Output function Next state Outputs Inputs

31 HC TD510231 n Break up the instructions into steps, each step takes a cycle u balance the amount of work to be done u restrict each cycle to use only one major functional unit n At the end of a cycle u store values for use in later cycles (easiest thing to do) u introduce additional “internal” registers n Notice: we distinguish u processor state: programmer visible registers u internal state: programmer invisible registers (like IR, MDR, A, B, and ALUout) Multicycle Approach

32 HC TD510232 Multicycle Approach Shift left 2 PC Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15–0] Sign extend 3216 Instruction [25–21] Instruction [20–16] Instruction [15–0] Instruction register 1 M u x 0 3 2 M u x ALU result ALU Zero Memory data register Instruction [15–11] A B ALUOut 0 1 Address

33 HC TD510233 Multicycle Approach n Note that previous picture does not include: u branch support u jump support u Control lines and logic n Tclock > max (ALU delay, Memory access, Regfile access) n See book for complete picture

34 HC TD510234 n Instruction Fetch n Instruction Decode and Register Fetch n Execution, Memory Address Computation, or Branch Completion n Memory Access or R-type instruction completion n Write-back step Five Execution Steps INSTRUCTIONS TAKE FROM 3 - 5 CYCLES!

35 HC TD510235 n Use PC to get instruction and put it in the Instruction Register n Increment the PC by 4 and put the result back in the PC Can be described succinctly using RTL "Register-Transfer Language" IR = Memory[PC]; PC = PC + 4; n Can we figure out the values of the control signals? n What is the advantage of updating the PC now? Step 1: Instruction Fetch

36 HC TD510236 n Read registers rs and rt in case we need them n Compute the branch address in case the instruction is a branch n Previous two actions are done optimistically!! RTL: A = Reg[IR[25-21]]; B = Reg[IR[20-16]]; ALUOut = PC+(sign-extend(IR[15-0])<< 2); n We aren't setting any control lines based on the instruction type (we are busy "decoding" it in our control logic) Step 2: Instruction Decode and Register Fetch

37 HC TD510237 n ALU is performing one of four functions, based on instruction type Memory Reference: ALUOut = A + sign-extend(IR[15-0]); R-type: ALUOut = A op B; Branch: if (A==B) PC = ALUOut; n Jump: PC = PC[31-28] || (IR[25-0]<<2) Step 3 (instruction dependent)

38 HC TD510238 Loads and stores access memory MDR = Memory[ALUOut]; or Memory[ALUOut] = B; R-type instructions finish Reg[IR[15-11]] = ALUOut; The write actually takes place at the end of the cycle on the edge Step 4 (R-type or memory-access)

39 HC TD510239 Memory read completion step Reg[IR[20-16]]= MDR; What about all the other instructions? Write-back step

40 HC TD510240 Steps taken to execute any instruction class Summary execution steps

41 HC TD510241 How many cycles will it take to execute this code? lw $t2, 0($t3) lw $t3, 4($t3) beq $t2, $t3, L1 #assume not taken add $t5, $t2, $t3 sw $t5, 8($t3) L1:... n What is going on during the 8th cycle of execution? In what cycle does the actual addition of $t2 and $t3 takes place? Simple Questions

42 HC TD510242 n Value of control signals is dependent upon: u what instruction is being executed u which step is being performed n Use the information we have accumulated to specify a finite state machine (FSM) u specify the finite state machine graphically, or u use microprogramming n Implementation can be derived from specification Implementing the Control

43 HC TD510243 FSM: high level view Start/reset Instruction fetch, decode and register fetch Memory access instructions R-type instructions Branch instruction Jump instruction

44 n How many state bits will we need? Graphical Specification of FSM PCWrite PCSource = 10 ALUSrcA = 1 ALUSrcB = 00 ALUOp = 01 PCWriteCond PCSource = 01 ALUSrcA =1 ALUSrcB = 00 ALUOp= 10 RegDst = 1 RegWrite MemtoReg = 0 MemWrite IorD = 1 MemRead IorD = 1 ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 RegDst=0 RegWrite MemtoReg=1 ALUSrcA = 0 ALUSrcB = 11 ALUOp = 00 MemRead ALUSrcA = 0 IorD = 0 IRWrite ALUSrcB = 01 ALUOp = 00 PCWrite PCSource = 00 Instruction fetch Instruction decode/ register fetch Jump completion Branch completionExecution Memory address computation Memory access Memory access R-type completion Write-back step ( O p = ' L W ' ) o r ( O p = ' S W ' ) ( O p = R - t y p e ) ( O p = ' B E Q ' ) ( O p = ' J ' ) ( O p = ' S W ' ) ( O p = ' L W ' ) 4 0 1 9862 753 Start

45 HC TD510245 Implementation: Finite State Machine for Control

46 HC TD510246 PLA Implemen- tation n If I picked a horizontal or vertical line could you explain it ? n What type of FSM is used? (see fig C.14) next state current state datapath control opcode

47 HC TD510247 n ROM = "Read Only Memory" u values of memory locations are fixed ahead of time n A ROM can be used to implement a truth table u if the address is m-bits, we can address 2 m entries in the ROM u our outputs are the bits of data that the address points to ROM Implementation 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 1 1 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 1 1 0 0 1 1 0 1 1 1 0 1 1 1 m is the "heigth", and n is the "width" m bits n bits ROM addressdata

48 HC TD510248 n How many inputs are there? 6 bits for opcode, 4 bits for state = 10 address lines (i.e., 2 10 = 1024 different addresses) n How many outputs are there? 16 datapath-control outputs, 4 state bits = 20 outputs n ROM is 2 10 x 20 = 20K bits (very large and a rather unusual size) n Rather wasteful, since for lots of the entries, the outputs are the same — i.e., opcode is often ignored ROM Implementation

49 HC TD510249 ROM Implementation Cheaper implementation: n Exploit the fact that the FSM is a Moore machine ==> u Control outputs only depend on current state and not on other incoming control signals ! u Next state depends on all inputs n Break up the table into two parts — 4 state bits tell you the 16 outputs, 2 4 x 16 bits of ROM — 10 bits tell you the 4 next state bits, 2 10 x 4 bits of ROM — Total number of bits: 4.3K bits of ROM

50 HC TD510250 n PLA is much smaller u can share product terms (ROM has an entry (=address) for every product term u only need entries that produce an active output u can take into account don't cares Size of PLA: (#inputs  #product-terms) + (#outputs  #product-terms) u For this example: (10x17)+(20x17) = 460 PLA cells n PLA cells usually slightly bigger than the size of a ROM cell ROM vs PLA

51 HC TD510251 Exceptions n Unexpected events n External: interrupt u e.g. I/O request n Internal: exception u e.g. Overflow, Undefined instruction opcode, Software trap, Page fault n How to handle exception? u Jump to general entry point (record exception type in status register) u Jump to vectored entry point u Address of faulting instruction has to be recorded !

52 HC TD510252 Exceptions Changes needed: see fig. 5.48 / 5.49 / 5.50 n Extend PC input mux with extra entry with fixed address: “C000000hex” n Add EPC register containing old PC (we’ll use the ALU to decrement PC with 4) u extra input ALU src2 needed with fixed value 4 n Cause register (one bit in our case) containing: u 0: undefined instruction u 1: ALU overflow n Add 2 states to FSM u undefined instr. state #10 u overflow state #11

53 HC TD510253 Exceptions 2 New states: Legend: IntCause =0/1type of exception CauseWritewrite Cause register ALUSrcA = 0select PC ALUSrcB = 01select constant 4 ALUOp = 01subtract operation EPCWritewrite EPC register with current PC PCWritewrite PC with exception address PCSource =11select exception address: C000000hex Legend: IntCause =0/1type of exception CauseWritewrite Cause register ALUSrcA = 0select PC ALUSrcB = 01select constant 4 ALUOp = 01subtract operation EPCWritewrite EPC register with current PC PCWritewrite PC with exception address PCSource =11select exception address: C000000hex IntCause =0 CauseWrite ALUSrcA = 0 ALUSrcB = 01 ALUOp = 01 EPCWrite PCWrite PCSource =11 IntCause =1 CauseWrite ALUSrcA = 0 ALUSrcB = 01 ALUOp = 01 EPCWrite PCWrite PCSource =11 #10 undefined instruction#11 overflow To state 0 (begin of next instruction)

Download ppt "Embedded Systems in Silicon TD5102 MIPS design Datapath and Control Henk Corporaal Technical University."

Similar presentations

Ads by Google