Presentation is loading. Please wait.

Presentation is loading. Please wait.

5. The Processor: Datapath and Control

Similar presentations


Presentation on theme: "5. The Processor: Datapath and Control"— Presentation transcript:

1 5. The Processor: Datapath and Control
순천향대학교 정보기술공학부 이 상 정

2 The Processor: Datapath & Control
We're ready to look at an implementation of the MIPS Simplified to contain only: memory-reference instructions: lw, sw arithmetic-logical instructions: add, sub, and, or, slt control flow instructions: beq, j Generic Implementation: use the program counter (PC) to supply instruction address get the instruction from memory read registers use the instruction to decide exactly what to do All instructions use the ALU after reading the registers Why? memory-reference? arithmetic? control flow? 순천향대학교 정보기술공학부 이 상 정

3 More Implementation Details
Abstract / Simplified View: Two types of functional units: elements that operate on data values (combinational) elements that contain state (sequential) 순천향대학교 정보기술공학부 이 상 정

4 State Elements Unclocked vs. Clocked Clocks used in synchronous logic
when should an element that contains state be updated? cycle time 순천향대학교 정보기술공학부 이 상 정

5 An unclocked state element
The set-reset latch output depends on present inputs and also on past inputs 순천향대학교 정보기술공학부 이 상 정

6 Latches and Flip-flops
Output is equal to the stored value inside the element (don't need to ask for permission to look at the value) Change of state (value) is based on the clock Latches: whenever the inputs change, and the clock is asserted Flip-flop: state changes only on a clock edge (edge-triggered methodology) "logically true", — could mean electrically low A clocking methodology defines when signals can be read and written — wouldn't want to read a signal at the same time it was being written 순천향대학교 정보기술공학부 이 상 정

7 D-latch Two inputs: Two outputs: the data value to be stored (D)
the clock signal (C) indicating when to read & store D Two outputs: the value of the internal state (Q) and it's complement 순천향대학교 정보기술공학부 이 상 정

8 D flip-flop Output changes only on the clock edge
순천향대학교 정보기술공학부 이 상 정

9 Our Implementation An edge triggered methodology Typical execution:
read contents of some state elements, send values through some combinational logic write results to one or more state elements 순천향대학교 정보기술공학부 이 상 정

10 Register File Do you understand? What is the “Mux” above?
Built using D flip-flops Do you understand? What is the “Mux” above? 순천향대학교 정보기술공학부 이 상 정

11 Abstraction Make sure you understand the abstractions!
Sometimes it is easy to think you do, when you don’t 순천향대학교 정보기술공학부 이 상 정

12 Register File Note: we still use the real clock to determine when to write 순천향대학교 정보기술공학부 이 상 정

13 Elements for Instruction Fetch
Basic Instruction Fetch Unit 순천향대학교 정보기술공학부 이 상 정

14 Elements for R-format Instructions / Load/Store Instuctions
Two elements for R-format R[rd] <- R[rs] op R[rt] Additional elements for Load/Store R[rt] <- Mem[R[rs] + SignExt[imm16]] 순천향대학교 정보기술공학부 이 상 정

15 Datapath for a Branch If (R[rs] == R[rt]) PC <- PC ( SignExt(imm16) x 4 ) Else PC <- PC + 4 순천향대학교 정보기술공학부 이 상 정

16 Datapath for Lw/Sw and R-type
순천향대학교 정보기술공학부 이 상 정

17 Putting it Altogether 순천향대학교 정보기술공학부 이 상 정

18 Control Selecting the operations to perform (ALU, read/write, etc.)
Controlling the flow of data (multiplexor inputs) Information comes from the 32 bits of the instruction Example: add $8, $17, $ op rs rt rd shamt funct ALU's operation based on instruction type and function code 순천향대학교 정보기술공학부 이 상 정

19 Three Instruction Classes
op. 6 bits and funct. 6 bits can be used to generate control signals I-type an J-type instructions only have op. 6 bits R-type instructions have both op. 6 bits and funct. 6 bits op(6) rs(5) rt(5) rd(5) shamt(5)func(6) op(6) rs(5) rt(5) 16 bit address op(6) bit address R I J 순천향대학교 정보기술공학부 이 상 정

20 ALU Control e.g., what should the ALU do with this instruction
Example: lw $1, 100($2) op rs rt bit offset ALU control input AND OR add subtract set-on-less-than NOR 순천향대학교 정보기술공학부 이 상 정

21 Control with Local Decoding
Small Control 1 (fast) op 6 Control 2 func 2 ALUop ALUctr 4 Big Single Control Logic (becomes slow) MemRead RegWrite MemtoReg ALUSrc MemWrite RegDst Branch ALU Control Main Control 순천향대학교 정보기술공학부 이 상 정

22 ALU Control (ALUctr) Instruction Opcode ALUOp operation Function field
Desired ALU action ALU control input LW 00 Load word xxxxxx Add 0010 SW Store word Branch equal 01 Subtract 0110 R-type 10 100000 100010 AND 100100 And 0000 OR 100101 Or 0001 Set on less than 101010 0111 순천향대학교 정보기술공학부 이 상 정

23 Truth Table for ALU Control
Can turn into gates: 순천향대학교 정보기술공학부 이 상 정

24 Control for R-type Instruction
Data path and active control signals are highlighted 순천향대학교 정보기술공학부 이 상 정

25 Control for “load” instruction
순천향대학교 정보기술공학부 이 상 정

26 Control for “branch equal”
순천향대학교 정보기술공학부 이 상 정

27 순천향대학교 정보기술공학부 이 상 정

28 Truth Table for Main Control
Input or output Signal name R-format lw sw beq Inputs Op5 1 Op4 Op3 Op2 Op1 Op0 Outputs RegDst X ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp1 ALUOp0 순천향대학교 정보기술공학부 이 상 정

29 Control Simple combinational logic (truth tables)
순천향대학교 정보기술공학부 이 상 정

30 Implementing Jump j Exit $pc jump addr. op Exit (26 bit address) xxxx
00 순천향대학교 정보기술공학부 이 상 정

31 Our Simple Control Structure
All of the logic is combinational We wait for everything to settle down, and the right thing to be done ALU might not produce “right answer” right away we use write signals along with clock to determine when to write Cycle time determined by length of the longest path We are ignoring some details like setup and hold times 순천향대학교 정보기술공학부 이 상 정

32 Performance of Single Cycle Datapath
Calculate cycle time assuming negligible delays except: memory (200ps), ALU and adders (100ps), register file access (50ps) Minimum Cycle Time > 600ps! Instruction class Functional units used by the instruction class Required Time R-type Instruction fetch Register access ALU 400ps lw Memory access 600ps sw 550ps beq 350ps 순천향대학교 정보기술공학부 이 상 정

33 Where we are headed Single Cycle Problems: One Solution:
what if we had a more complicated instruction like floating point? wasteful of area One Solution: use a “smaller” cycle time have different instructions take different numbers of cycles a “multicycle” datapath: 순천향대학교 정보기술공학부 이 상 정

34 What’s wrong with our CPI=1 processor?
Arithmetic & Logical PC Inst Memory Reg File ALU setup mux mux Load PC Inst Memory Reg File ALU Data Mem setup mux mux Critical Path Store PC Inst Memory Reg File ALU Data Mem mux Branch PC Inst Memory Reg File cmp mux Long Cycle Time All instructions take as much time as the slowest Real memory is not so nice as our idealized memory cannot always get the job done in one (short) cycle 순천향대학교 정보기술공학부 이 상 정

35 => Reducing Cycle Time
Cut combinational dependency graph and insert register / latch Do same work in two fast cycles, rather than one slow one storage element Combinational Logic (A) Logic (B) storage element Combinational Logic => 순천향대학교 정보기술공학부 이 상 정

36 Partitioning the CPI=1 Datapath
Add registers between smallest steps MemRd MemWr RegDst RegWr nPC_sel ExtOp ALUSrc ALUctr Reg. File Operand Fetch Exec Instruction Fetch Access Mem PC Next PC 순천향대학교 정보기술공학부 이 상 정

37 Example Multicycle Datapath
MemToReg RegDst RegWr MemRd MemWr nPC_sel ALUSrc ALUctr ExtOp Reg. File ALU Ext Reg File A S PC IR Next PC B Mem Access M Instruction Fetch Result Store Operand Fetch 순천향대학교 정보기술공학부 이 상 정

38 R-rtype (add, sub, . . .) Logical Register Transfer
inst Logical Register Transfers ADD R[rd] <– R[rs] + R[rt]; PC <– PC + 4 Logical Register Transfer Physical Register Transfers inst Physical Register Transfers IR <– MEM[pc] ADD A<– R[rs]; B <– R[rt] S <– A + B R[rd] <– S; PC <– PC + 4 Reg. File Reg File A S Exec PC IR Next PC Inst. Mem B Mem Access M 순천향대학교 정보기술공학부 이 상 정

39 Load Physical Register Transfers Logical Register Transfer Reg. File
inst Physical Register Transfers IR <– MEM[pc] LW A<– R[rs]; B <– R[rt] S <– A + SignEx(Im16) M <– MEM[S] R[rt] <– M; PC <– PC + 4 inst Logical Register Transfers LW R[rt] <– MEM(R[rs] + sx(Im16); PC <– PC + 4 Exec Reg. File Mem Access A B S M Reg PC Next PC IR Inst. Mem 순천향대학교 정보기술공학부 이 상 정

40 Store Logical Register Transfer Physical Register Transfers Reg. File
inst Logical Register Transfers SW MEM(R[rs] + sx(Im16) <– R[rt]; PC <– PC + 4 inst Physical Register Transfers IR <– MEM[pc] SW A<– R[rs]; B <– R[rt] S <– A + SignEx(Im16); MEM[S] <– B PC <– PC + 4 Reg. File Reg File A S Exec PC IR Next PC Inst. Mem B Mem Access M 순천향대학교 정보기술공학부 이 상 정

41 Branch Physical Register Transfers Logical Register Transfer Equal
inst Physical Register Transfers IR <– MEM[pc] BEQ A<– R[rs]; B <– R[rt] S <– A - B Eq:PC<-PC+4+sx(Im16)||00, ~Eq:PC <– PC + 4 inst Logical Register Transfers BEQ if R[rs] == R[rt] then PC <= PC sx(Im16) || 00 else PC <= PC + 4 Equal Reg. File Reg File A S Exec PC IR Next PC Inst. Mem B Mem Access M 순천향대학교 정보기술공학부 이 상 정

42 Summary: 순천향대학교 정보기술공학부 이 상 정

43 Multiple Cycle Datapath
Miminizes Hardware: 1 memory, 1 adder Ideal Memory WrAdr Din RAdr 32 Dout MemWr ALU ALUOp Control Instruction Reg IRWr Reg File Ra Rw busW Rb 5 busA busB RegWr Rs Rt Mux 1 Rd PCWr ALUSelA RegDst PC MemtoReg Extend ExtOp 2 3 4 16 Imm << 2 ALUSelB Zero PCWrCond PCSrc IorD S A B M 순천향대학교 정보기술공학부 이 상 정

44 Multi-Cycle Datapath & Control Signals
Figure 5.28 in page 323 순천향대학교 정보기술공학부 이 상 정

45 High-Level Control Flow
Instruction fetch / decode and register fetch Memory access instructions R-type instructions Branch instructions Jump instruction start Common 2-clock sequence to fetch/decode any instruction Separate sequences of 1 to 3 clocks to execute specific types of instruction Two techniques for control finite state machine microprogramming 순천향대학교 정보기술공학부 이 상 정

46 Review of Finite State Machine(FSM)
a set of states and next state function (determined by current state and input) output function (determined by current state and input) We will use a Moore machine output based only on current state cf. Mealy machine: output based on both current state and input Current state Next-state function Output Inputs Outputs clock 순천향대학교 정보기술공학부 이 상 정

47 Control Finite State Machine(FSM) Diagram
순천향대학교 정보기술공학부 이 상 정

48 Control Finite State Machine Structure
Outputs to datapath control determined by current state Next state determined by current state and input from instruction register Combinational Control Logic (random logic or PLA) State register inputs outputs Next state Datapath control outputs inputs from instruction register opcode field 순천향대학교 정보기술공학부 이 상 정

49 PLA Implementation AND OP code plane current state OR
순천향대학교 정보기술공학부 이 상 정

50 Microprogramming Microprogramming is an alternate structure for implementing the control finite state machine represent outputs and next state selection as microinstructions in a memory (the control store) addressed by the current state (often called the microprogram counter) the control store can be implemented in ROM or RAM (writable control store) provides level of abstraction between software and datapath hardware: firmware 순천향대학교 정보기술공학부 이 상 정

51 “Macroprogram” Interpretation
Main Memory execution unit control memory CPU User program plus Data AND microsequence e.g., Fetch Calc Operand Addr Fetch Operand(s) Calculate Save Answer(s) one of these is mapped into one of these AND SUB ADD DATA 순천향대학교 정보기술공학부 이 상 정

52 Exceptions System Exception user program Handler Exception:
return from exception Normal Control Flow sequential, jumps, branches, calls, returns Exception = unprogrammed control transfer system takes action to handle the exception must record the address of the offending instruction returns control to user must save & restore user state 순천향대학교 정보기술공학부 이 상 정

53 Exceptions vs. Interrupts
MIPS convention: exception means any unexpected change in control flow, without distinguishing internal or external use the term interrupt only when the event is externally caused. Type of event From where? MIPS terminology I/O device request External Interrupt Invoke OS from user program Internal Exception Arithmetic overflow Internal Exception Using an undefined instruction Internal Exception Hardware malfunctions Either Exception or Interrupt 순천향대학교 정보기술공학부 이 상 정

54 Exception Handling in MIPS
Save PC in the dedicated register EPC like $ra but can’t assume that $ra is unused Records the reason for the exception in the Cause register used by operating system to figure out what went wrong Load special value into PC, the exception handler’s address transfer execution to exception handler The instruction RFE restores the PC from EPC like jr $ra for subroutines but a special register just for exceptions 순천향대학교 정보기술공학부 이 상 정

55 Updated Datapath to Implement Exceptions
순천향대학교 정보기술공학부 이 상 정

56 Updated Finite State Machine
순천향대학교 정보기술공학부 이 상 정

57 Pentium 4 Chapter 7 Chapter 6
Pipelining is important (last IA-32 without it was in 1985) Pipelining is used for the simple instructions favored by compilers “Simply put, a high performance implementation needs to ensure that the simple instructions execute quickly, and that the burden of the complexities of the instruction set penalize the complex, less frequently used, instructions” Chapter 6 Chapter 7 순천향대학교 정보기술공학부 이 상 정

58 Pentium 4 Somewhere in all that “control we must handle complex instructions Processor executes simple microinstructions, 70 bits wide (hardwired) 120 control lines for integer datapath (400 for floating point) If an instruction requires more than 4 microinstructions to implement, control from microcode ROM (8000 microinstructions) Its complicated! 순천향대학교 정보기술공학부 이 상 정


Download ppt "5. The Processor: Datapath and Control"

Similar presentations


Ads by Google