Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS Computer Architecture Week 10: Single Cycle Implementation

Similar presentations


Presentation on theme: "CS Computer Architecture Week 10: Single Cycle Implementation"— Presentation transcript:

1 CS35101- Computer Architecture Week 10: Single Cycle Implementation
Paul Durand ( ) [Adapted from M Irwin ( ] [Adapted from COD, Patterson & Hennessy, © 2005, UCB]

2

3 Head’s Up This week’s material Next week’s material
Building a MIPS datapath Single cycle datapath implementation & control Reading assignment – PH Next week’s material Multiple cycle datapath implementation & control Reading assignment – PH

4 Review: Design Principles
Simplicity favors regularity fixed size instructions – 32-bits only three instruction formats Good design demands good compromises three instruction formats Smaller is faster limited instruction set limited number of registers in register file limited number of addressing modes Make the common case fast arithmetic operands from the register file (load-store machine) allow instructions to contain immediate operands

5 The Processor: Datapath & Control
We're ready to look at an implementation of the MIPS Simplified to contain only: memory-reference instructions: lw, sw arithmetic-logical instructions: add, sub, and, or, slt control flow instructions: beq, j Generic implementation: use the program counter (PC) to supply the instruction address and fetch the instruction from memory (and update the PC) decode the instruction (and read registers) execute the instruction All instructions (except j) use the ALU after reading the registers Why? memory-reference? arithmetic? control flow? Fetch PC = PC+4 Decode Exec memory reference use ALU to compute addresses arithmetic use the ALU to do the require arithmetic control use the ALU to compute branch conditions.

6 Abstract Implementation View
Two types of functional units: elements that operate on data values (combinational) elements that contain state (sequential) Single cycle operation Split memory (Harvard) model - one memory for instructions and one for data Write Data Instruction Memory Address Read Data Register File Reg Addr Data Memory Read Data PC Address Instruction ALU Reg Addr Read Data Write Data Reg Addr Have the class tell which is which in the picture - combinational and sequential

7 Single Cycle Implementation – where we are headed

8 Clocking Methodologies
Clocking methodology defines when signals can be read and when they can be written falling (negative) edge rising (positive) edge cycle time clock rate = 1/(cycle time) e.g., 10 nsec cycle time = 100 MHz clock rate 1 nsec cycle time = 1 GHz clock rate State element design choices level sensitive latch master-slave and edge-triggered flipflops

9 Review: State Elements
Set-reset latch Level sensitive D latch latch is transparent when clock is high (copies input to output) R S Q(t+1) !Q(t+1) 1 Q(t) !Q(t) R Q !Q S clock D Q clock !Q D Q

10 Review: State Elements, con’t
Race problem with latch based design … Consider the case when D-latch0 holds a 0 and D- latch1 holds a 1 and you want to transfer the contents of D-latch0 to D-latch1 and vice versa must have the clock high long enough for the transfer to take place must not leave the clock high so long that the transferred data is copied back into the original latch Two-sided clock constraint D Q D Q D-latch0 D-latch1 clock !Q clock !Q clock

11 Review: State Elements, con’t
Solution is to use flipflops that change state (Q) only on clock edge (master-slave) master (first D-latch) copies the input when the clock is high (the slave (second D-latch) is locked in its memory state and the output does not change) slave copies the master when the clock goes low (the master is now locked in its memory state so changes at the input are not loaded into the master D-latch) One-sided clock constraint must have the clock cycle time long enough to accommodate the worst case delay path D clock Q !Q D-latch D clock Q Also have set-up and hold-times to deal with

12 Our Implementation An edge-triggered methodology Typical execution
read contents of some state elements send values through some combinational logic write results to one or more state elements Assumes state elements are written on every clock cycle; if not, need explicit write control signal write occurs only when both the write control is asserted and the clock edge occurs State element 1 State element 2 Combinational logic clock one clock cycle

13 Fetching Instructions
Fetching instructions involves reading the instruction from the Instruction Memory updating the PC to hold the address of the next instruction PC is updated every cycle, so it does not need an explicit write control signal Instruction Memory is read every cycle, so it doesn’t need an explicit read control signal Add 4 Instruction Memory Read Address PC Instruction

14 Decoding Instructions
Decoding instructions involves sending the fetched instruction’s opcode and function field bits to the control unit Control Unit Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Data 2 Instruction reading two values from the Register File Register File addresses are contained in the instruction

15 Executing R Format Operations
R format operations (add, sub, slt, and, or) perform the indicated (by op and funct) operation on values in rs and rt store the result back into the Register File (into location rd) Note that Register File is not written every cycle (e.g. sw), so we need an explicit write control signal for the Register File R-type: 31 25 20 15 5 op rs rt rd funct shamt 10 RegWrite ALU control Read Addr 1 Read Data 1 Register File Read Addr 2 overflow Instruction zero ALU Write Addr Read Data 2 Write Data

16 Executing Load and Store Operations
compute a memory address by adding the base register (in rs) to the 16-bit signed offset field in the instruction base register was read from the Register File during decode offset value in the low order 16 bits of the instruction must be sign extended to create a 32-bit signed value store value, read from the Register File during decode, must be written to the Data Memory load value, read from the Data Memory, must be stored in the Register File I-Type: op rs rt address offset 31 25 20 15

17 Executing Load and Store Operations, con’t
Instruction Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Data 2 ALU overflow zero ALU control RegWrite Data Memory Address Read Data Sign Extend MemWrite MemRead For class handout

18 Executing Load and Store Operations, con’t
Instruction Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Data 2 ALU overflow zero ALU control RegWrite Data Memory Address Read Data Sign Extend MemWrite MemRead 16 32 For lecture

19 Executing Branch Operations
Branch operations have to compare the operands read from the Register File during decode (rs and rt values) for equality (zero ALU output) compute the branch target address by adding the updated PC to the sign extended16-bit signed offset field in the instruction “base register” is the updated PC offset value in the low order 16 bits of the instruction must be sign extended to create a 32-bit signed value and then shifted left 2 bits to turn it into a word address I-Type: op rs rt address offset 31 25 20 15

20 Executing Branch Operations, con’t
Add Branch target address Add 4 Shift left 2 ALU control PC zero (to branch control logic) Read Addr 1 Read Data 1 Register File Read Addr 2 Instruction ALU Write Addr For class handout Read Data 2 Write Data Sign Extend 16 32

21 Executing Branch Operations, con’t
Add Branch target address Add 4 Shift left 2 ALU control PC zero (to branch control logic) Read Addr 1 Read Data 1 Register File Read Addr 2 Instruction ALU Write Addr For lecture Read Data 2 Write Data Sign Extend 16 32

22 Executing Jump Operations
Jump operations have to replace the lower 28 bits of the PC with the lower 26 bits of the fetched instruction shifted left by 2 bits 31 25 J-Type: op jump target address Add 4 4 Jump address Instruction Memory Shift left 2 28 Read Address PC Instruction 26

23 Our Simple Control Structure
We wait for everything to settle down ALU might not produce “right answer” right away we use write signals along with the clock edge to determine when to write (to the Register File and the Data Memory) Cycle time determined by length of the longest path We are ignoring some details like register setup and hold times

24 Creating a Single Datapath from the Parts
Assemble the datapath segments discussed earlier, add control lines as needed, and design the control path Fetch, decode and execute each instructions in one clock cycle – single cycle design no datapath resource can be used more than once per instruction, so some must be duplicated (e.g., why we have a separate Instruction Memory and Data Memory) to share datapath elements between two different instruction classes will need multiplexors at the input of the shared elements with control lines to do the selection Cycle time is determined by length of the longest path

25 Fetch, R, and Memory Access Portions
Read Address Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Data 1 Data 2 ALU ovf zero ALU control RegWrite Data Read Data MemWrite MemRead Sign Extend 16 32

26 Multiplexor Insertion
MemtoReg Read Address Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Data 1 Data 2 ALU ovf zero ALU control RegWrite Data Read Data MemWrite MemRead Sign Extend 16 32 ALUSrc

27 Clock Distribution clock cycle System Clock MemtoReg ovf zero
Read Address Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Data 1 Data 2 ALU ovf zero ALU control RegWrite Data Read Data MemWrite MemRead Sign Extend 16 32 ALUSrc

28 Adding the Branch Portion
Read Address Instruction Memory Add PC 4 Shift left 2 Add PCSrc Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Data 2 ALU ovf zero ALU control RegWrite Data Memory Address Read Data MemWrite MemRead Sign Extend 16 32 MemtoReg ALUSrc

29 Adding the Control Observations
Selecting the operations to perform (ALU, Register File and Memory read/write) Controlling the flow of data (multiplexor inputs) Information comes from the 32 bits of the instruction R-type: 31 25 20 15 5 op rs rt rd funct shamt 10 Observations op field always in bits 31-26 addr of two registers to be read are always specified by the rs and rt fields (bits and 20-16) addr. of register to be written is in one of two places – in rt (bits 20-16) for lw; in rd (bits 15-11) for R-type instructions base register for lw and sw always in rs (bits 25-21) offset for beq, lw, and sw always in bits 15-0 I-Type: op rs rt address offset 31 25 20 15

30 (Almost) Complete Single Cycle Datapath
Add Add 1 4 Shift left 2 PCSrc RegDst 1 Register File Read Data 1 Data 2 RegWrite Sign Extend 16 32 ALUSrc MemWrite MemtoReg ovf Instr[25-21] Read Addr 1 Instruction Memory Address Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] Read Data 1 ALU Write Addr Instr[ ] Write Data Write Data 1 ALU control ALUOp Instr[5-0] Instr[15-0] MemRead

31 ALU Control ALU's operation based on instruction type and function code Why is the code for subtract 110 and not 011? ALU control input Function 000 and 001 or 010 add 110 subtract 111 set on less than

32 ALU Control, Con’t Controlling the ALU makes use of multiple levels of decoding main control unit generates the ALUOp bits ALU control unit generates ALU control inputs Instr op funct ALUOp desired action ALU control input lw xxxxxx 00 sw beq 01 add 100000 10 010 subt 100010 subtract 110 and 100100 000 or 100101 001 slt 101010 111 For class handout

33 ALU Control, Con’t Controlling the ALU makes use of multiple levels of decoding main control unit generates the ALUOp bits ALU control unit generates ALU control inputs Instr op funct ALUOp desired action ALU control input lw xxxxxx 00 sw beq 01 add 100000 10 010 subt 100010 subtract 110 and 100100 000 or 100101 001 slt 101010 111 add 010 subtract 110 For lecture

34 ALU Control Truth Table
F5 F4 F3 F2 F1 F0 ALUOp1 ALUOp0 Op2 Op1 Op0 X 1 For class handout Can make use of more don’t cares since ALUOp does not use the encoding 11 since F5 and F4 are always 10 Logic comes from the K-maps …

35 ALU Control Truth Table
F5 F4 F3 F2 F1 F0 ALUOp1 ALUOp0 Op2 Op1 Op0 X 1 X For lecture Can make use of more don’t cares since ALUOp does not use the encoding 11 since F5 and F4 are always 10 Logic comes from the K-maps …

36 ALU Control Combinational Logic
From the truth table can design the ALU Control logic

37 (Almost) Complete Datapath with Control Unit
Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] Read Data 1 ALU Note mux control inputs have been swapped (for three of the muxes) from the last picture to be consistent with the book. Write Addr Read Data 2 1 Write Data Instr[ ] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]

38 Main Control Unit Instr RegDst ALUSrc MemReg RegWr MemRd MemWr Branch ALUOp1 ALUOp2 R-type 000000 lw 100011 sw 101011 beq 000100 For class handout Completely determined by the instruction opcode field Note that a multiplexor whose control input is 0 has a definite action, even if it is not used in performing the operation

39 R-type Instruction Data/Control Flow
Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] Read Addr 2 Data Memory Read Address PC Instr[31-0] Read Data 1 ALU For class handout – have a student come forward and mark the connections in the datapath that are active. And show the state of the control lines. Write Addr Read Data 2 1 Write Data Instr[ ] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]

40 R-type Instruction Data/Control Flow
Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] Read Data 1 ALU For lecture Write Addr Read Data 2 1 Write Data Instr[ ] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]

41 Store Word Instruction Data/Control Flow
Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] Read Data 1 ALU For class handout – have a student come forward and mark the connections in the datapath that are active. And show the state of the control lines. Write Addr Read Data 2 1 Write Data Instr[ ] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]

42 Store Word Instruction Data/Control Flow
Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] Read Data 1 ALU For lecture Write Addr Read Data 2 1 Write Data Instr[ ] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]

43 Load Word Instruction Data/Control Flow
Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] Read Data 1 ALU For class handout – have a student come forward and mark the connections in the datapath that are active. And show the state of the control lines. Write Addr Read Data 2 1 Write Data Instr[ ] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]

44 Load Word Instruction Data/Control Flow
Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] Read Data 1 ALU For lecture Write Addr Read Data 2 1 Write Data Instr[ ] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]

45 Branch Instruction Data/Control Flow
Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] Read Data 1 ALU For class handout – have a student come forward and mark the connections in the datapath that are active. And show the state of the control lines. Write Addr Read Data 2 1 Write Data Instr[ ] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]

46 Branch Instruction Data/Control Flow
Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] Read Data 1 ALU For lecture Write Addr Read Data 2 1 Write Data Instr[ ] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]

47 Main Control Unit Instr RegDst ALUSrc MemReg RegWr MemRd MemWr Branch ALUOp1 ALUOp0 R-type 000000 1 X lw 100011 sw 101011 beq 000100 For lecture Completely determined by the instruction opcode field Note that a multiplexor whose control input is 0 has a definite action, even if it is not used in performing the operation

48 Control Unit Logic From the truth table can design the Main Control logic Instr[31] Instr[30] Instr[29] Instr[28] Instr[27] Instr[26] R-type lw sw beq RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp1 ALUOp0

49 Review: Handling Jump Operations
Jump operation have to replace the lower 28 bits of the PC with the lower 26 bits of the fetched instruction shifted left by 2 bits 31 J-Type: op jump target address Add 4 4 Jump address Instruction Memory Shift left 2 28 Read Address PC Instruction 26

50 Adding the Jump Operation
Instr[25-0] 1 Shift left 2 28 32 26 PC+4[31-28] Add Add 1 4 Shift left 2 PCSrc Jump ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC For class handout Instr[31-0] Read Data 1 ALU Write Addr Read Data 2 1 Write Data Instr[ ] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]

51 Adding the Jump Operation
Instr[25-0] 1 Shift left 2 28 32 26 PC+4[31-28] Add Add 1 4 Shift left 2 PCSrc Jump ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC For lecture Good exam questions Add jalr rs,rd 0 rs 0 rd 0 9 jump to instr whose addr is in rs and save addr of next inst (PC+4) in rd Add the PowerPC addressing modes of update addressing and indexed addressing (will have to expand the RegFile to be three read port and two write port) Add andi, ori, addi - have to have both a signextend and a zeroextend and choose between the two, will have to augment the ALUop encoding (since can’t get the op information out of the funct bits as with R-type) Add mult rs, rt with the result being left in hi|lo - so also include the mfhi and mflo instructions (will have to add a multiplier, the hi and lo registers and then a couple of muxes and their control). Add barrel shifter Instr[31-0] Read Data 1 ALU Write Addr Read Data 2 1 Write Data Instr[ ] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]

52 Single Cycle Implementation Cycle Time
Unfortunately, though simple, the single cycle approach is not used because it is inefficient Clock cycle must have the same length for every instruction What is the longest path (slowest instruction)?

53 Instruction Critical Paths
Calculate cycle time assuming negligible delays (for muxes, control unit, sign extend, PC access, shift left 2, wires) except: Instruction and Data Memory (2ns) ALU and adders (2ns) Register File access (reads or writes) (1ns) Instr. I Mem Reg Rd ALU Op D Mem Reg Wr Total R-type load store beq jump

54 Instruction Critical Paths
Calculate cycle time assuming negligible delays (for muxes, control unit, sign extend, PC access, shift left 2, wires) except: Instruction and Data Memory (2ns) ALU and adders (2ns) Register File access (reads or writes) (1ns) Instr. I Mem Reg Rd ALU Op D Mem Reg Wr Total R-type load store beq jump 2 1 6 2 1 8 Note that PC is updated during I Mem read 2 1 7 2 1 5 2

55 Where We are Headed Problems with single cycle datapath design
uses clock cycle inefficiently and what if we had a more complicated instruction like floating point multiply? wasteful of area Another approach use a “smaller” cycle time have different instructions take different numbers of cycles a “multicycle” datapath: Address Read Data (Instr. or Data) Memory PC Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Data 2 ALU IR MDR A B ALUout


Download ppt "CS Computer Architecture Week 10: Single Cycle Implementation"

Similar presentations


Ads by Google