Presentation is loading. Please wait.

Presentation is loading. Please wait.

Yu-Lun Kuo Computer Sciences and Information Engineering

Similar presentations


Presentation on theme: "Yu-Lun Kuo Computer Sciences and Information Engineering"— Presentation transcript:

1 Computer Organization and Architecture Chapter 5: The Processor: Datapath and Control
Yu-Lun Kuo Computer Sciences and Information Engineering University of Tunghai, Taiwan CS252 S05

2 5.1 Introduction The performance of a machine
Instruction count Clock cycle time Clock cycles per instruction (CPI) The compiler and the instruction set architecture Determine the instruction count required for a given instruction CS252 S05

3 5.1 Introduction Both the clock cycle time and the number of CPI
Determined by the implementation of the processor We construct the datapath and control unit for two different implementations of the MIPS instruction set Single cycle implementation Multi cycle implementation CS252 S05

4 5.1 Introduction We are going to see how the processor is implemented
starting with a very simple processor, and adding some more complexity CS252 S05

5 Basic MPIS Implementation
Include a subset of the MIPS instruction Memory-reference instructions: lw and sw The ALU instructions: add, sub, and, or, slt Control flow instructions: beq and j Generic Implementation Use the program counter (PC) to supply instruction address Fetch the instruction from memory Read one/two registers Use the instruction to decide exactly what to do CS252 S05

6 Basic MPIS Implementation
All instructions use the ALU after reading the registers (except jump) Memory-reference instructions use ALU for address calculation Arithmetic-logical instructions for the operation execution Branches for comparison CS252 S05

7 Our Processor, sort of… What’s missing How to combine input that are “joined” together How to tell which component what to do?

8

9 Multiplexers and Controllers
In the previous figure we have two or more “wires” going into the input of a component This is because depending on the instruction being executed different input should be provided So, based on the instruction, we need to decide which input should be selected This is done with a multiplexer (多工器) M U X input 1 . . . selected output input n control: ceil(log2(n)) bits

10 What about the Control? So great, now we can control multiplexers
Need a controller sends the appropriate control bits to all the multiplexers and the components Besides, there are other things to control Example: the ALU has a bunch of control bits, that tells it what to do: 00: ADD 01: SUB 10: MUL 11: SHIFT 2-bit control

11 Control Unit (Simplified)
offset 0 or 1 M U X input 1 PC input 0 Add 4 instruction register . . .

12 A More Complete Picture

13

14 5.2 Logic Design Conventions
The functional units (功能單元) in the MIPS implementation consist of two different types of logic elements Elements that operate on data values (combinational) Outputs depend only on the current inputs Always produces the same output It has no internal storage Elements that contain state (sequential) Has at least two inputs and one output Data value to be written into the element Clock: determine when the data value is written The value that was written in a previous clock cycle CS252 S05

15 Clocking Methodology Clocking methodology
When signals can be read and when they can be written If a signal is written at the same time it is read. Computer designs cannot tolerate such unpredictability The clock cycle/period is divided into two portions high clock low clock falling edge rising edge clock cycle CS252 S05

16 Edge-triggered Clocking
meaning that state changes (in state elements) occur only at a clock edge Using either the rising edge or the falling edge Typical execution: Read contents of some state elements Send values through some combinational logic Write results to one or more state elements Combinational logic State element 1 State element 2 Clock cycle CS252 S05

17 The Clock state element #1 state element #2 combinatorial circuit stable stable by edge updated on edge clock cycle In the above, we want to use the value in state element #1 to modify the value in state element #2: It takes one cycle We need all signals to be stabilized

18

19 Read/Write in a Clock Cycle
A great implication of edge-triggered clocking A state element can be read and written in the same clock cycle We will say things like: “reads happen in the first half of the clock cycle, writes happen in the second half” state element #1 state element #2 combinatorial circuit stable stable by edge updated on edge

20

21 Write Control Signal (p.291)
Both the clock signal and the write control signal are inputs The state element is changed only when The write control signal is asserted Clock edge occurs Assuming a rising edge update: While the control bit stays at 0, nothing happen If we set the control bit to 1, the state element will be updated at the next rising edge CS252 S05

22 Busses and bus width Many of the state elements and combinational elements take multi-bit inputs (often 32-bit inputs) The term “bus” refers to a wire that carries more than one bit multiple 1-bit wires, really We simply indicate the width of the busses as follows: 16 control signal 8

23 Building a Datapath A datapath is an element in the processor that is supposed to operate on or hold data instruction memory, data memory, register file, ALU, adders Let’s re-examine the datapath elements we only barely introduced earlier

24 Building a Datapath Start by looking at which datapath elememts each instruction needs Also show their control signals Program Counter (PC) (程式計數器) (Register) Memory unit to store the instructions of a program and supply instructions given an address 32 bits register that will written at the end of every clock cycle (not need a write control signal) Adder (加法器) Increment the PC to the address of the next instruction Combinational. Built from the ALU CS252 S05

25 The Three Elements Two state element are needed to store and access instructions The instruction memory only provide read Output at any time reflects the contents of the location specified by the address input An adder is needed to compute the next instruction address (+4 Bytes) ALU wired to always perform an add CS252 S05

26 Fetching Instructions
add read address, instruction retrieved from instruction memory 32 PC 4 32 read address PC +4 latched into PC Instruction 32 Instruction Memory The PC gets updated in 1 clock cycle because we use edge-triggered clocking

27 Register File The processor’s 32 general-purpose registers
Stored in a structure called register file Register file Collection of registers in which any register can be read or written by specifying the number of the register in the file Clock 5 bits 32 bits 5 bits 5 bits 32 bits 32 bits Control signal CS252 S05

28 Datapath: Instruction Store/Fetch & PC Increment
Three elements used to store and fetch instructions and increment the PC Datapath

29 Animating the Datapath
Instruction <- MEM[PC] PC <- PC + 4

30 What about R-type instructions
These instructions take 3 registers as arguments: 1 output register 2 input registers Example: add $t1, $t2, $t3 Which reads $t2 and $t3 and writes $t1 We need an input that contains data to be written into the output register Typically comes from the ALU We need a Write signal to trigger the register write on the next clock edge A write anytime during the clock cycle could lead to race conditions if that register is also read CS252 S05

31 Datapath: R-Type Instruction
Two elements used to implement R-type instructions Datapath

32 Register File and ALU ALU Extracted from the 32-bit instruction code 5
Read register 1 5 Read data 1 Register number Read register 2 ALU 32 32 zero 5 Write register 32 32 Read data 2 Operation 32 Write data 4 32 Register File RegWrite

33 Add t1, t1, t2 (sketch) ALU 5 t1 Read register 1 5 Read data 1 t2 Read
n s t r u c o 5 t1 Read register 1 5 Read data 1 ALU t2 Read register 2 32 5 t1 Write register zero Read data 2 Operation 4 32 Write data 32 Register File RegWrite (must be set only at the next edge)

34 Animating the Datapath (R-type)
add rd, rs, rt R[rd] <- R[rs] + R[rt];

35 What about the Load/Store
Ex. lw t1, offset(t2) The is computed by adding the 16-bit signed offset to the input register The offset of 16-bit, but memory addresses are 32-bit Therefore, the offset must be sign-extended into a 32-bit value before being added to the input register The memory has both read and write control MemWrite control signal MemRead control signal CS252 S05

36 Datapath: Load/Store Instruction
Two additional elements used To implement load/stores Datapath

37 Implementing Load/Store
MemWrite Read data Address 32 32 sign extend 16 32 Write data 32 Data Memory Sign-extension Unit MemRead Data Memory Unit

38

39 Implementing lw s1,offset(s2)
5 Read register 1 5 Read data 1 Read register 2 32 5 Write register 32 Read data 2 Write data 32 Register File i n s t r u c o MemWrite (not set) RegWrite (set on next edge) s1 Read data Address 32 32 add s2 Write data sign extend 16 32 offset 32 32 Data Memory MemRead (set)

40 Animating the Datapath (Load)
lw rt, offset(rs) R[rt] <- MEM[R[rs]+s_extend(offset)];

41 Animating the Datapath (Store)
sw rt, offset(rs) MEM[R[rs]+sign_extend(offset)] <- R[rt]

42 What about the Branch (beq)
2 registers that are compared To do a branch we must Compute the branch’s target address based on its offset Decide whether the branch is taken or not taken Taken: branch target address becomes the new PC PC = (PC+4)+4*(target field) Not taken: if the operands are not equal, PC=PC+4 as usual CS252 S05

43 Branch Datapath Datapath No shift hardware required:
simply connect wires from input to output, each shifted left 2 bits Datapath CS252 S05

44

45 Animating the Datapath (branch)
beq rs, rt, offset if (R[rs] == R[rt]) then PC <- PC+4 + s_extend(offset<<2)

46 Putting it altogether The simplest design is one in which
all instructions are executed in a single clock cycle In this case, every element of the datapath is used only once per clock cycle No duplication of hardware needed Or only of a few adders perhaps here and there And we need separate Data and Instruction memories Let’s at first put together the pieces for the R-type (ALU) instructions and the memory instructions as they are quite similar. CS252 S05

47 Altogether (not quite)
Combining the datapaths for R-type instructions and load/stores using two multiplexors We “simply” add multiplexer (多工器) for choosing between the datapath for the ALU instructions and the memory instructions

48

49 Animating the Datapath: R-type Instruction
add rd,rs,rt

50 Animating the Datapath: Load Instruction
lw rt,offset(rs)

51 Animating the Datapath: Store Instruction
sw rt,offset(rs)

52 Adding instruction fetch
Separate adder as ALU operations and PC increment occur in the same clock cycle Separate instruction memory as instruction and data read occur in the same clock cycle Adding instruction fetch

53

54 Complete Altogether Adding branch capability and another multiplexor
New multiplexor Extra adder needed as both adders operate in each cycle Instruction address is either PC+4 or branch target address Adding branch capability and another multiplexor Important note: in a single-cycle implementation data cannot be stored during an instruction – it only moves through combinational logic Question: is the MemRead signal really needed?! Think of RegWrite…!

55 5.4 What now? At this point we’ve identified most of the component for an almost full datapath for a very simple implementation of the MIPS ISA Let us now design the logic that makes it all work i.e., how we set the control signals

56 Datapath Executing add
add rd, rs, rt

57 Datapath Executing lw lw rt,offset(rs)

58 Datapath Executing sw sw rt,offset(rs)

59 Datapath Executing beq
beq r1,r2,offset

60 Control Unit Let’s go through the type of control signals that need to be generated An important set of signals if for the ALU Our ALU has four control signals: ALU controls Function AND OR add subtract set on less than NOR

61 Controlling the ALU Depending on the instruction, the ALU will need to perform on of these five function For Load/Store: the ALU needs to add For R-type instructions: depends on the 6-bit function field in the low-order bits of the instructions (Remember Chapter 2) For branch: the ALU needs to subtract CS252 S05

62 Controlling the ALU We can generate the 4-bit ALU control using a small control unit that takes: 2 control bits called ALUOp add (00), sub (01), depends (10) the instruction’s function field ALU control inputs based on 2-bit ALUOp control 6-bit function code CS252 S05

63

64 Determining ALU Control Bits
Don’t Care Inst. Opcode ALUop Inst. Operation Func. Field Desired ALU action ALU control input lw 00 load xxxxxx add 0010 sw store beq 01 branch subtract 0110 R-type 10 100000 100010 and 100100 0000 or 100101 0001 Set on < 101010 0111 CS252 S05

65 Design ALU Control Unit
Designing logic Useful to create a truth table for the interesting combinations of the function code field and the ALUOp bits It can be optimized and then turned into gates CS252 S05

66 The Three Instruction Classes
R-type, load and store, and branch formats Need to add a multiplexor to select which field of the instruction is used to indicate the destination register 20:16 bit position (rt) for load 15:11 bit position (rd) for R-type instruction R-type op:0 rs rt rd shamt funct 31: : : : : :0 Load & store 35 or 43 rs rt address 31: : : :0 Branch 4 or 5 rs rt address 31: : : :0 CS252 S05

67 New Control Signals RegDst: destination comes from rt vs. rd
RegWrite: register should be written ALUSrc: ALU operand from register vs. instruction PCSrc: PC from adder vs. branch target MemRead: for lw MemWrite: for store MemtoReg: register write from ALU vs memory

68

69 The Seven Control Signals
Signal Name Effect when deasserted (未被拉起時的功能) Effect when asserted (被拉起時的功能) RegDst The register destination number comes from rt field ([20:16]) The register destination number comes from rd field ([15:11]) RegWrite ALUSrc PCSrc MemRead MemWrite Mem2Reg

70 The Seven Control Signals
Signal Name Effect when deasserted Effect when asserted RegDst The register destination number comes from rt field ([20:16]) The register destination number comes from rd field ([15:11]) RegWrite None The write register is written with the value on the write data input ALUSrc PCSrc MemRead MemWrite Mem2Reg

71 The Seven Control Signals
Signal Name Effect when deasserted Effect when asserted RegDst The register destination number comes from rt field ([20:16]) The register destination number comes from rd field ([15:11]) RegWrite None The write register is written with the value on the write data input ALUSrc The second ALU operand comes from Read data 2 The second ALU operand is the sign-extended, lower 16 bits PCSrc MemRead MemWrite Mem2Reg

72 The Seven Control Signals
Signal Name Effect when deasserted Effect when asserted RegDst The register destination number comes from rt field ([20:16]) The register destination number comes from rd field ([15:11]) RegWrite None The write register is written with the value on the write data input ALUSrc The second ALU operand comes from Read data 2 The second ALU operand is the sign-extended, lower 16 bits PCSrc The PC is replaced by the output of the adder, PC+4 The PC is replaced by the output of the adder, the branch target MemRead MemWrite Mem2Reg

73 The Seven Control Signals
Signal Name Effect when deasserted Effect when asserted RegDst The register destination number comes from rt field ([20:16]) The register destination number comes from rd field ([15:11]) RegWrite None The write register is written with the value on the write data input ALUSrc The second ALU operand comes from Read data 2 The second ALU operand is the sign-extended, lower 16 bits PCSrc The PC is replaced by the output of the adder, PC+4 The PC is replaced by the output of the adder, the branch target MemRead Data memory contents designated by the address are put on the Read data output MemWrite Mem2Reg

74 The Seven Control Signals
Signal Name Effect when deasserted Effect when asserted RegDst The register destination number comes from rt field ([20:16]) The register destination number comes from rd field ([15:11]) RegWrite None The write register is written with the value on the write data input ALUSrc The second ALU operand comes from Read data 2 The second ALU operand is the sign-extended, lower 16 bits PCSrc The PC is replaced by the output of the adder, PC+4 The PC is replaced by the output of the adder, the branch target MemRead Data memory contents designated by the address are put on the Read data output MemWrite Data memory contents designated by the address are replaced by the write data input Mem2Reg

75 The Seven Control Signals
Signal Name Effect when deasserted Effect when asserted RegDst The register destination number comes from rt field ([20:16]) The register destination number comes from rd field ([15:11]) RegWrite None The write register is written with the value on the write data input ALUSrc The second ALU operand comes from Read data 2 The second ALU operand is the sign-extended, lower 16 bits PCSrc The PC is replaced by the output of the adder, PC+4 The PC is replaced by the output of the adder, the branch target MemRead Data memory contents designated by the address are put on the Read data output MemWrite Data memory contents designated by the address are replaced by the write data input Mem2Reg Write to the register. Write data input comes from the ALU Write to the register. Write data input comes from the data memory

76

77 PCSrc cannot be set directly from the opcode: zero test outcome is required Determining control signals for the MIPS datapath based on instruction opcode

78 Control Signals: R-Type Instruction
??? Value depends on funct 1 1 Control signals shown in blue

79 Control Signals: lw Instruction
010 1 1 1 Control signals shown in blue 1

80 Control Signals: sw Instruction
010 X 1 X 1 Control signals shown in blue

81 Control Signals: beq Instruction
1 if Zero=1 110 X X Control signals shown in blue

82 Single-Cycle Design Problems (p.314)
Assuming fixed-period clock every instruction datapath uses one clock cycle implies CPI = 1 Cycle time determined by length of the longest instruction path (load) But several instructions could run in a shorter clock cycle: waste of time Resources used more than once in the same cycle need to be duplicated waste of hardware and chip area CS252 S05

83 Performance of Single-Cycle
Memory units: 200 ps ALU and adder: 100ps Register file (read/write): 50ps multiplexors, control unit, PC accesses, sign extension, wires: no delay Assume instruction mix as follows all loads take same time and comprise 25% all stores take same time and comprise 10% R-format instructions comprise 45% branches comprise 15% jumps comprise 5% Compare the performance of (a) a single-cycle implementation using a fixed-period clock with (b) one using a variable-period clock where each instruction executes in one clock cycle that is only as long as it needs to be (not really practical but pretend it’s possible!) CS252 S05

84 Functional units used by the instruction class
Solution (1/3) CPU time = Instruction_count x CPI x clock_cycle CPU time = Instruction_count x clock_cycle (CPI=1) We need only find the clock cycle time, since instruction count and CPI are the same for both implementations Instruction class Functional units used by the instruction class R-type Inst. fetch Reg. access ALU Load word Memory access Store word Branch Jump CS252 S05

85 Solution (2/3) Machine with a single clock for all instruction
be determined by the longest instruction  600 ps Machine with a variable clock Find average clock cycle length 400*45%+600*25%+550*10%+350*15%+200*5% =447.5ps It is clearly faster Instruction class Inst. Memory Reg. read ALU operation Data memory Reg. write Total R-type 200 50 100 400 ps Load word 600 ps Store word 550 ps Branch 350 ps Jump 200 ps CS252 S05

86 Solution (3/3) Unfortunately, implementing a variable-speed clock for each instruction class is extremely difficult Overhead for such an approach could be larger than any advantage gained CS252 S05

87 Example: Practice Consider a machine with an additional floating point unit. Assume functional unit delays as follows memory: 2 ns., ALU and adders: 2 ns., FPU add: 8 ns., FPU multiply: 16 ns., register file access (read or write): 1 ns. multiplexors, control unit, PC accesses, sign extension, wires: no delay Assume instruction mix as follows all loads take same time and comprise 31% all stores take same time and comprise 21% R-format instructions comprise 27% branches comprise 5% jumps comprise 2% FP adds and subtracts take the same time and totally comprise 7% FP multiplys and divides take the same time and totally comprise 7% Compare the performance of (a) a single-cycle implementation using a fixed-period clock with (b) one using a variable-period clock where each instruction executes in one clock cycle that is only as long as it needs to be (not really practical but pretend it’s possible!)

88 Solution Instruction Instr. Register ALU Data Register FPU FPU Total class mem. read oper. mem. write add/ mul/ time sub div ns. Load word Store word R-format Branch Jump FP mul/div FP add/sub Clock period for fixed-period clock = longest instruction time = 20 ns. Average clock period for variable-period clock = 8  31% +7  21% + 6  27% + 5  5% + 2  2% + 20  7% + 12  7% = 7.0 ns. Therefore, performancevar-period /performancefixed-period = 20/7 = 2.9

89 5.5 Multi-Cycle Implementation
The design of a multi-cycle implementation The idea is to have the functional units and a set of additional registers to hold important values in between the cycles of a single instruction This way a functional unit can be shared between cycles of the same instruction provided some multiplexers are added to decide where the input should come from This sharing can help reduce the amount of hardware required CS252 S05

90 Multi-cycle Design Major Advantages Compare with single-cycle version
Instructions to take different numbers of clock cycles Share functional units within the execution of a single instruction Compare with single-cycle version Single memory unit is used for both instructions and data Single ALU (not ALU and two adders) One or more registers are added after every functional unit to hold the output Until the value is used in a subsequent clock cycle CS252 S05

91 Multi-cycle Design The clock cycle can accommodate at most one of the following operations Memory access Register file access (two reads or one write) ALU operation So, data produced by one of these three functional units must be saved Into a temporary register for use on a later cycle CS252 S05

92 Temporary Register Instruction register (IR)
Save the output of the memory for an instruction read Memory data register (MDR) Save the output of the memory for a data read A and B registers Hold the register operand values read from the register file ALUOut register Hold the output of the ALU CS252 S05

93 Multi-cycle vs. single-cycle
single memory for data and instructions single ALU, no extra adders extra registers to hold data between clock cycles Single-cycle datapath Multicycle datapath (high-level view)

94 Multicycle Datapath Basic multicycle MIPS datapath handles R-type instructions and load/stores: new internal register in red ovals, new multiplexors in blue ovals

95 Breaking Instructions into Steps
Our goal is to break up the instructions into steps so that Each step takes one clock cycle The amount of work to be done in each step/cycle is about equal Each cycle uses at most once each major functional unit so that such units do not have to be replicated Functional units can be shared between different cycles within one instruction Data at end of one cycle to be used in next must be stored !! CS252 S05

96 Breaking Instructions into Steps
For MIPS, we can think of the instruction running in 5 1-cycle stages Instruction fetch and PC increment (IF) Instruction decode and register fetch (ID) Execution, memory address computation, or branch completion (EX) Memory access or R-type instruction completion (MEM) Memory read completion (WB) Each MIPS instruction takes from 3 – 5 cycles (steps) CS252 S05

97 For MIPS, we can think of the instruction running in 5 1-cycle stages
Instruction fetch and PC increment (IF) Instruction decode and register fetch (ID) Execution, memory address computation, or branch completion (EX) Memory access or R-type instruction completion (MEM) Memory read completion (WB) CS252 S05

98 Step 1: Instruction Fetch & PC Increment (IF)
IR = Memory[PC]; PC = PC + 4; Use PC to get instruction and write the instruction into instruction register (IR) Increment the PC by 4 and put the result back in the PC The new value of the PC is not visible until the next clock cycle (stored into ALUOut) In this step we don’t know yet what the instruction does CS252 S05

99 For MIPS, we can think of the instruction running in 5 1-cycle stages
Instruction fetch and PC increment (IF) Instruction decode and register fetch (ID) Execution, memory address computation, or branch completion (EX) Memory access or R-type instruction completion (MEM) Memory read completion (WB) CS252 S05

100 Step 2: Instruction Decode and Register Fetch (ID)
Read registers rs and rt in case we need them Read them from the register file and store the values into the temporary register A and B Compute the branch address with the ALU and save it in a temporary register A = Reg[IR[25-21]]; B = Reg[IR[20-16]]; ALUOut = PC+(sign-extend(IR[15-0]) << 2); CS252 S05

101 For MIPS, we can think of the instruction running in 5 1-cycle stages
Instruction fetch and PC increment (IF) Instruction decode and register fetch (ID) Execution, memory address computation, or branch completion (EX) Memory access or R-type instruction completion (MEM) Memory read completion (WB) CS252 S05

102 Step 3: Execution, Address Computation or Branch Completion (EX)
Action to be taken depending on the instruction class Memory reference (lw and sw, [rs]+offset) ALUOut = A + sign-extend(IR[15-0]); Arithmetic-logical instruction (R-type) ALUOut = A op B; Branch (A-B ? 0) if (A==B) PC = ALUOut; Jump PC = PC[31-28] || (IR(25-0) << 2) CS252 S05

103 For MIPS, we can think of the instruction running in 5 1-cycle stages
Instruction fetch and PC increment (IF) Instruction decode and register fetch (ID) Execution, memory address computation, or branch completion (EX) Memory access or R-type instruction completion (MEM) Memory read completion (WB) CS252 S05

104 Step 4: Memory access or R-type Instruction Completion (MEM)
Load or Store instruction accesses memory and an arithmetic-logical instruction writes its result If the instruction is a load Value is retrieved from memory, it is stored into the memory data register (MDR) If the instruction is a store Data is written to memory If the instruction is a R-type instruction Place the result from the ALU into a temporary register (ALUOut), write to rd CS252 S05

105 For MIPS, we can think of the instruction running in 5 1-cycle stages
Instruction fetch and PC increment (IF) Instruction decode and register fetch (ID) Execution, memory address computation, or branch completion (EX) Memory access or R-type instruction completion (MEM) Memory read completion (WB) CS252 S05

106 Step 5: Memory Read Completion (WB)
Loads complete by writing back the value from memory Write the load data, which was stored into MDR Write back into the register rt Reg[IR[20-16]]= MDR; CS252 S05

107 Summary of Instruction Execution
Step 1: IF 2: ID 3: EX 4: MEM 5: WB CS252 S05

108 Very important to remember the content of this slide
The schematic view EX IF ID Mem WB uses the memory uses the register file uses the register file uses the memory uses the ALU Very important to remember the content of this slide

109 Multicycle Execution Step (1): Instruction Fetch
IR = Memory[PC]; PC = PC + 4; 4 PC + 4

110 Multicycle Execution Step (2): Instruction Decode & Register Fetch
A = Reg[IR[25-21]]; (A = Reg[rs]) B = Reg[IR[20-15]]; (B = Reg[rt]) ALUOut = (PC + sign-extend(IR[15-0]) << 2) PC + 4 Branch Target Address Reg[rs] Reg[rt]

111 Multicycle Execution Step (3): Memory Reference Instructions
ALUOut = A + sign-extend(IR[15-0]); Reg[rs] Reg[rt] PC + 4 Mem. Address

112 Multicycle Execution Step (3): ALU Instruction (R-Type)
ALUOut = A op B Reg[rs] Reg[rt] PC + 4 R-Type Result

113 Multicycle Execution Step (3): Branch Instructions
if (A == B) PC = ALUOut; Reg[rs] Reg[rt] Branch Target Address Branch Target Address

114 Multicycle Execution Step (3): Jump Instruction
PC = PC[31-28] concat (IR[25-0] << 2) Reg[rs] Reg[rt] Branch Target Address Jump Address

115 Multicycle Execution Step (4): Memory Access - Read (lw)
MDR = Memory[ALUOut]; PC + 4 Reg[rs] Reg[rt] Mem. Address Mem. Data

116 Multicycle Execution Step (4): Memory Access - Write (sw)
Memory[ALUOut] = B; PC + 4 Reg[rs] Reg[rt]

117 Multicycle Execution Step (4): ALU Instruction (R-Type)
Reg[IR[15:11]] = ALUOUT R-Type Result Reg[rs] Reg[rt] PC + 4

118 Multicycle Execution Step (5): Memory Read Completion (lw)
Reg[IR[20-16]] = MDR; PC + 4 Reg[rs] Reg[rt] Mem. Data Address

119 Multicycle Datapath with Control I
… with control lines and the ALU control block added – not all control lines are shown CS252 S05

120 Multicycle Datapath with Control II
New gates For the jump address New multiplexor Complete multicycle MIPS datapath (with branch and jump capability) and showing the main control block and all control lines CS252 S05

121 Action of the Control Signals
Action of the 1-bit control signals RegDst, RegWrite ALUSrcA MemRead, MemWrite, MemtoRe IorD IRWrite PCWrite, PCWriteCond Action of the 2-bit control signals ALUOp ALUSrcB PCSource CS252 S05

122 Multicycle Control Step (1): Fetch
IR = Memory[PC]; PC = PC + 4; 1 1 X 010 X 1 1 CS252 S05

123 Multicycle Control Step (2): Instruction Decode & Register Fetch
A = Reg[IR[25-21]]; (A = Reg[rs]) B = Reg[IR[20-15]]; (B = Reg[rt]) ALUOut = (PC + sign-extend(IR[15-0]) << 2); X X 010 X X 3 CS252 S05

124 Multicycle Control Step (3): Memory Reference Instructions
ALUOut = A + sign-extend(IR[15-0]); X 1 X 010 X X 2 CS252 S05

125 Multicycle Control Step (3): ALU Instruction (R-Type)
ALUOut = A op B; X 1 X ??? X X CS252 S05

126 Multicycle Control Step (3): Branch Instructions
if (A == B) PC = ALUOut; 1 if Zero=1 X 1 X 011 1 X CS252 S05

127 Multicycle Execution Step (3): Jump Instruction
PC = PC[21-28] concat (IR[25-0] << 2); 1 X X X XXX 2 X X CS252 S05

128 Multicycle Control Step (4): Memory Access - Read (lw)
MDR = Memory[ALUOut]; 1 X X XXX X X 1 X CS252 S05

129 Multicycle Execution Steps (4) Memory Access - Write (sw)
Memory[ALUOut] = B; 1 X 1 X XXX X X X CS252 S05

130 Multicycle Control Step (4): ALU Instruction (R-Type)
Reg[IR[15:11]] = ALUOut; (Reg[Rd] = ALUOut) IRWrite I Instruction I jmpaddr 28 32 R <<2 CONCAT 5 I[25:0] PCWr* rs rt rd X 1 2 M U X MUX 1 X 32 RegDst IorD 5 5 1 XXX 5 ALUSrcA PC Operation M U X 1 MemWrite RN1 RN2 WN M U X 1 3 ADDR M M U X 1 Registers PCSource Memory D RD1 A Zero X RD R WD ALU ALU WD RD2 B M U X 1 2 3 OUT MemRead MemtoReg 4 1 RegWrite 1 E X T N D ALUSrcB immediate 16 32 <<2 X

131 Multicycle Execution Steps (5) Memory Read Completion (lw)
Reg[IR[20-16]] = MDR; IRWrite I Instruction I 28 32 R jmpaddr 5 I[25:0] <<2 CONCAT PCWr* X rs rt rd X 1 2 M U X MUX 1 32 RegDst 5 5 XXX IorD 5 ALUSrcA PC Operation M U X 1 MemWrite RN1 RN2 WN M U X 1 3 ADDR M M U X 1 Registers PCSource Zero X Memory D RD1 A RD R WD ALU ALU OUT WD RD2 B M U X 1 2 3 MemRead MemtoReg 4 RegWrite 1 E X T N D ALUSrcB immediate 16 32 X <<2

132 CPI in a Multicycle CPU What is the CPI assuming each step requires 1 clock cycle? An instruction mix of 25% loads, 10% stores, 11% branches, 2% jumps, and 52% ALU Solution: Number of clock cycles from previous slide for each instruction class: loads 5, stores 4, ALU 4, branches 3, jumps 3 CPI = CPU clock cycles / instruction count =  (instruction countclass i  CPIclass i) / instruction count =  (instruction countclass I / instruction count)  CPIclass I = 0.25      3 = 4.12 Better than the worst-case CPI of 5.0 CS252 S05

133 Conclusion If instructions take different amounts of time, multi-cycle is better We haven’t dived into the gory details of implementing a multi-cycle processors What we’ve talked covers Sections 5.1, 5.2, 5.3, 5.4, and a small subset of Section 5.5 This is all you need to read in the book Don’t worry about most of the stuff in Section 5.5 We are now ready to talk about our “big” topic: Pipelining

134 Q & A CS252 S05

135 Chapter 5: Datapath and Control (資料路徑與控制單元)
Single-Cycle Implementation v.s. Multi-Cycle Implementation MIPS Instruction types and formats What is Datapath? What are the datapath elements of MIPS? What are the five steps of MIPS datapath? Control unit design What are the two kinds of control unit design? Describe their implementations and compare them. Exception and Interrupt Definitions Operations CS252 S05

136 Example A[2] = | A[0] + A[1] |.
Assume the base address of word array A is stored in the register $s0. The following code is used for the calculation: A[2] = | A[0] + A[1] |. Highlight the running path of the following instructions in blue in the simple datapath and mark the control signal. Assume the first instruction is stored in the address of hex . lw $t0, 0($s0) lw $t1, 4($s0) add $t1, $t1, $t0 slt $t0, $t1, $zero beq $t0, $zero, Label sub $t1, $zero, $t1 sw $t1, 8($s0) j Exit Label: sw $t1, 8($s0) Exit:

137 The Simple Datapath with Controls
M U X 1 4 Branch Control Shift left 2 [31:26] PC Read address Instruction [31:0] Memory RegWrite [25:21] Read register 1 Read data 1 Read register 2 Write register Read data 2 Write data Register files [20:16] MemWrite Mem2Reg Zero M U X 1 Address Read data Write data Data Memory ALUsrc ALU 15:11 M U X 1 M U X 1 RegDst Sign- extend ALU control [15:0] MemRead 16 32 ALUop [5:0]

138 LW $t0, 0/4($s0) PC ALU Register files Data Memory 32 4 Control
M U X 1 4 Branch Control Shift left 2 [31:26] PC Read address Instruction [31:0] Memory RegWrite [25:21] Read register 1 Read data 1 Read register 2 Write register Read data 2 Write data Register files [20:16] MemWrite Mem2Reg Zero M U X 1 Address Read data Write data Data Memory ALUsrc ALU 15:11 M U X 1 M U X 1 RegDst Sign- extend ALU control [15:0] MemRead 16 32 ALUop [5:0]

139 The Setting of Control Lines
Instruc-tions RegDst ALUSrc Mem2Reg Reg- Write Mem- Read Branch ALUOp1 ALUOp0 lw 1

140 add $t1, $t1, $t0 / slt $t0, $t1, $zero / sub $t1, $zero, $t1
M U X 1 4 Branch Control Shift left 2 [31:26] PC Read address Instruction [31:0] Memory RegWrite [25:21] Read register 1 Read data 1 Read register 2 Write register Read data 2 Write data Register files [20:16] MemWrite Mem2Reg Zero M U X 1 Address Read data Write data Data Memory ALUsrc ALU 15:11 M U X 1 M U X 1 RegDst Sign- extend ALU control [15:0] MemRead 16 32 ALUop [5:0]

141 The Setting of Control Lines
Instruc-tions RegDst ALUSrc Mem2Reg Reg- Write Mem- Read Branch ALUOp1 ALUOp0 lw 1 R-type

142 beq $t0, $zero, Label (the case $t0 = zero)
M U X 1 4 Branch Control Shift left 2 [31:26] PC Read address Instruction [31:0] Memory RegWrite [25:21] Read register 1 Read data 1 Read register 2 Write register Read data 2 Write data Register files [20:16] MemWrite Mem2Reg Zero M U X 1 Address Read data Write data Data Memory ALUsrc ALU 15:11 M U X 1 M U X 1 RegDst Sign- extend ALU control [15:0] MemRead 16 32 ALUop [5:0]

143 beq $t0, $zero, Label (the case $t0 != zero)
M U X 1 4 Branch Control Shift left 2 [31:26] PC Read address Instruction [31:0] Memory RegWrite [25:21] Read register 1 Read data 1 Read register 2 Write register Read data 2 Write data Register files [20:16] MemWrite Mem2Reg Zero M U X 1 Address Read data Write data Data Memory ALUsrc ALU 15:11 M U X 1 M U X 1 RegDst Sign- extend ALU control [15:0] MemRead 16 32 ALUop [5:0]

144 The Setting of Control Lines
Instruc-tions RegDst ALUSrc Mem2Reg Reg- Write Mem- Read Branch ALUOp1 ALUOp0 lw 1 R-type beq x

145 sw $t1, 8($s0) PC ALU Register files Data Memory 32 4 Control Address
M U X 1 4 Branch Control Shift left 2 [31:26] PC Read address Instruction [31:0] Memory RegWrite [25:21] Read register 1 Read data 1 Read register 2 Write register Read data 2 Write data Register files [20:16] MemWrite Mem2Reg Zero M U X 1 Address Read data Write data Data Memory ALUsrc ALU 15:11 M U X 1 M U X 1 RegDst Sign- extend ALU control [15:0] MemRead 16 32 ALUop [5:0]

146 The Setting of Control Lines
Instruc-tions RegDst ALUSrc Mem2Reg Reg- Write Mem- Read Branch ALUOp1 ALUOp0 lw 1 R-type beq x sw


Download ppt "Yu-Lun Kuo Computer Sciences and Information Engineering"

Similar presentations


Ads by Google