1 The single cycle CPU. 2 Performance of Single-Cycle Machines Memory Unit 2 ns ALU and Adders 2 ns Register file (Read or Write) 1 ns Class Fetch Decode.

Slides:

Advertisements

Similar presentations

Adding the Jump Instruction

Advertisements

1  1998 Morgan Kaufmann Publishers Summary:. 2  1998 Morgan Kaufmann Publishers How many cycles will it take to execute this code? lw $t2, 0($t3) lw.

1 Chapter Five The Processor: Datapath and Control.

CS-447– Computer Architecture Lecture 12 Multiple Cycle Datapath

1 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. The single cycle CPU.

VHDL Development for ELEC7770 VLSI Project Chris Erickson Graduate Student Department of Electrical and Computer Engineering Auburn University, Auburn,

Chapter 5 The Processor: Datapath and Control Basic MIPS Architecture Homework 2 due October 28 th. Project Designs due October 28 th. Project Reports.

1 5.5 A Multicycle Implementation A single memory unit is used for both instructions and data. There is a single ALU, rather than an ALU and two adders.

CS 161Computer Architecture Chapter 5 Lecture 12

CSCE 212 Chapter 5 The Processor: Datapath and Control Instructor: Jason D. Bakos.

Fall 2007 MIPS Datapath (Single Cycle and Multi-Cycle)

1 Chapter Five. 2 We're ready to look at an implementation of the MIPS Simplified to contain only: –memory-reference instructions: lw, sw –arithmetic-logical.

1 The single cycle CPU. 2 Performance of Single-Cycle Machines Memory Unit 2 ns ALU and Adders 2 ns Register file (Read or Write) 1 ns Class Fetch Decode.

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

Advanced Computer Architecture 5MD00 / 5Z033 MIPS Design data path and control Henk Corporaal TUEindhoven 2007.

Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Digital Architectures1 Machine instructions execution steps (1) FETCH = Read the instruction.

331 W10.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 10 Building a Multi-Cycle Datapath [Adapted from Dave Patterson’s.

1  1998 Morgan Kaufmann Publishers We're ready to look at an implementation of the MIPS Simplified to contain only: –memory-reference instructions: lw,

1 The Processor: Datapath and Control We will design a microprocessor that includes a subset of the MIPS instruction set: –Memory access: load/store word.

Class 9.1 Computer Architecture - HUJI Computer Architecture Class 9 Microprogramming.

1 We're ready to look at an implementation of the MIPS Simplified to contain only: –memory-reference instructions: lw, sw –arithmetic-logical instructions:

S. Barua – CPSC 440 CHAPTER 5 THE PROCESSOR: DATAPATH AND CONTROL Goals – Understand how the various.

1 ׃1998 Morgan Kaufmann Publishers פקודת ה- jump 4 bits 26 bits 2 bits 00 : כתובת קפיצה במילים : כתובת קפיצה בבתים … …

Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.

Computing Systems The Processor: Datapath and Control.

1 For some program running on machine X, Performance X = 1 / Execution time X "X is n times faster than Y" Performance X / Performance Y = n Problem:

Chapter 5 Processor Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides We're ready to look at an implementation of the MIPS Simplified.

EECS 322: Computer Architecture

1 Computer Organization & Design Microcode for Control Sec. 5.7 (CDROM) Appendix C (CDROM) / / pdf / lec_3a_notes.pdf.

CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides

Exam 2 Review Two’s Complement Arithmetic Ripple carry ALU logic and performance Look-ahead techniques Basic multiplication and division ( non- restoring)

Computer Architecture and Design – ECEN 350 Part 6 [Some slides adapted from A. Sprintson, M. Irwin, D. Paterson and others]

Gary MarsdenSlide 1University of Cape Town Stages.

Overview of Control Hardware Development

Multicycle Implementation

ECE-C355 Computer Structures Winter 2008 The MIPS Datapath Slides have been adapted from Prof. Mary Jane Irwin ( )

Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.

CDA 3101 Spring 2016 Introduction to Computer Organization Microprogramming and Exceptions 08 March 2016.

Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.

1. 2 MIPS Hardware Implementation Full die photograph of the MIPS R2000 RISC Microprocessor. The 1986 MIPS R2000 with five pipeline stages and 450,000.

Design a MIPS Processor (II)

CS161 – Design and Architecture of Computer Systems

CS161 – Design and Architecture of Computer Systems

IT 251 Computer Organization and Architecture

Chapter 5 The Processor: Datapath and Control

Systems Architecture I

Computer Organization & Design Microcode for Control Sec. 5

CS/COE0447 Computer Organization & Assembly Language

Multiple Cycle Implementation of MIPS-Lite CPU

CSCE 212 Chapter 5 The Processor: Datapath and Control

Processor: Finite State Machine & Microprogramming

Computer Architecture

Chapter Five The Processor: Datapath and Control

The Multicycle Implementation

Processor: Datapath and Control (part 2)

Chapter Five The Processor: Datapath and Control

The Multicycle Implementation

Systems Architecture I

Vishwani D. Agrawal James J. Danaher Professor

COSC 2021: Computer Organization Instructor: Dr. Amir Asif

Processor (II).

Processor: Multi-Cycle Datapath & Control

Multi-Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

Review Fig 4.15 page 320 / Fig page 322

Chapter Four The Processor: Datapath and Control

5.5 A Multicycle Implementation

Systems Architecture I

CS161 – Design and Architecture of Computer Systems

Presentation transcript:

1 The single cycle CPU

2 Performance of Single-Cycle Machines Memory Unit 2 ns ALU and Adders 2 ns Register file (Read or Write) 1 ns Class Fetch Decode ALU Memory Write Back Total R-format LW SW ns Branch ns Jump 2 2ns

3 מה היה קורה עם cycle של השעון היה באורך משתנה נשווה לגבי תוכנית עם התערובת הבאה של פקודות: Rtype: 44%, LW: 24%, SW: 12% BRANCH: 18%, JUMP: 2% I - מספר פקודות בתוכנית T - אורך מחזור שעון CPI - מספר מחזורים לפקודה = 1 Execution=I*T*CPI= 8*24%+7*12%+6*44%+5*18%+2*2%=6.3 ns

4 התוצאה EXE Single cycle T single clock * I T single clock 8 EXE Variable T variable clock * I T variable clock 6.3 יחס של היחס יהיה יותר גרוע כאשר נממש פקודות מסובכות כמו פעולות עם floating point הפתרון: אינו שעון בגודל משתנה - מסובך מבחינת הבניה. הפתרון: פקודה לוקחת מספר משתנה של cycles.

5 Multicycle Approach הרעיון מאחורי שיטת ה- Multicycle: חיסכון בזמן: כל פקודה תקח את מספר היחידות השעון הנחוצות לה. חיסכון ברכיבים: שימוש באותו רכיב בשלבים שונים של הפקודה.

6 שיטת הבניה של ארכיטקטורת ה- Multicycle חלק את הפקודה לשלבים. כל שלב cycle: - אזן את כמות העבודה הנדרשת בכל שלב. - הקטן את כמות העבודה הנדרשת בכל שלב - כל שלב יבצע רק פעולה אחת פונקצינאלית. בסיום כל מחזור שעון: - שמור את הערכים עבור השלבים הבאים. - הוסף לביצוע משימה זו רגיסטרים פנימיים נוספים.

7 PC D. Mem data D.Mem adrs 0x Rs, RtALU inputs ALU output (address) Memory output fetch Write backdecode execute Mem data memory I.Mem data PC IR A,B ALUout Mem data MDR fetch Write back decode execute memory Timing of a lw instruction in a single cycle CPU Timing of a lw instruction in a multi-cycle CPU 2ns We want to replace a long single CK cycle with 5 short ones: 1ns2ns 1ns 0x Instruction in IR ALU calculates something 01345=(0)2

8 Therefore we should add registers to the single cycle CPU shown below: 5 [25:21]=Rs 5 [20:16]=Rt Reg File Instruction Memory PCALU Adde r 4 ck 16 [15:0] 5 Sext 16->32 Data Memory Rd Address D.In D. Out

9 Adding registers to “split” the instruction to 5 stages: 5 [25:21]=Rs 5 [20:16]=Rt Reg File Instruction Memory PCALU Adde r 4 ck 16 [15:0] 5 Sext 16->32 Data Memory Rd Address D.In D. Out IR ck A B ALUoutMDR PCWrite

10 Here is the book’s version of the multi-cycle CPU: Only PC and IR have write enable signals All other registers hold data for a single cycle

11 Here is our version of A mult--cycle CPU capable of R-type & lw/sw & branch instructions 5 IR[20:16]=Rt Reg File Instruction & data Memory PC ALU 4 ck 16 IR[15:0] 5 Sext 16->32 5 IR[25:21]=Rs Rd IR ck IR ck ALUout ck A B << 2

12 Let us explain the multi-cycle CPU First we’ll look at a CPU capable of performing only R-type instructions Then, we’ll add the lw instruction And the sw instruction Then, the beq instruction And finally, the j instruction

13 Let us remind ourselves how works a single cycle CPU capable of performing R-type instructions. Here you see the data-path and the timing of an R-type instruction. 5 [25:21]=Rs 5 [20:16]=Rt 5 [15:11]=Rd Reg File Instruction Memory PCALU Adde r 4 ck 6 [31:26] 6 [5:0]= funct

14 A single cycle CPU demo: R-type instruction 5 [25:21]=Rs 5 [20:16]=Rt 5 [15:11]=Rd Reg File Instruction Memory PC ALU ck 4

15 A multi cycle CPU capable of performing R-type instructions 5 IR[20:16]=Rt Reg File Instruction & data Memory PC ALU ck 5 5 IR[25:21]=Rs Rd IR ck ALUout ck A B

16 A multi cycle CPU capable of R-type & instructions fetch 5 IR[20:16]=Rt Reg File Instruction & data Memory PC ALU ck 5 5 IR[25:21]=Rs Rd IR ck ALUout ck A B 0 1

17 A multi cycle CPU capable of R-type & instructions decode 5 IR[20:16]=Rt Reg File Instruction & data Memory PC ALU ck 5 5 IR[25:21]=Rs Rd IR ck ALUout ck A B 1 2

18 A multi cycle CPU capable of R-type & instructions execute 5 IR[20:16]=Rt Reg File Instruction & data Memory PC ck 5 5 IR[25:21]=Rs Rd IR ck ALUout ck A B ALU 2 3

19 A multi cycle CPU capable of R-type & instructions write back 5 IR[20:16]=Rt Reg File Instruction & data Memory PC ALU ck 5 5 IR[25:21]=Rs Rd IR ck ALUout ck A B Rd ck 3 4

20 PC GPR input 0x Rs, RtALU inputs ALU output (Data = result of cala.) Memory output = the instruction fetch decode executeWrite Back Inst. Mem data Mem data IR A,B ALUout fetch Write back decode execute Timing of an R-type instruction in a single cycle CPU Timing of an R-type instruction in a multi-cycle CPU 34 (=0)012 PC Previous inst.Current instruction

21 Mem data IR A,B ALUout fetch Write back decode execute GPR outputs ALU output IR=M ( PC ) A= Rs, B= Rt ALUuot= A op B IRWrite At the rising edge of CK: Rd=ALUout R-Type instruction takes 4 CKs PC Previous inst. Current instruction next inst. IR=M(PC) A= Rs, B= Rt ALUout = A op B Rd=ALUout The state diagram:

22 A multi-cycle CPU capable of R-type instructions (PC calc. ) 5 IR[20:16]=Rt Reg File Instruction & data Memory PC ALU 4 ck 5 5 IR[25:21]=Rs Rd IR ck ALUout ck A B

23 Mem data IR A,B ALUout fetch Write back decode execute GPR outputs ALU output ALUuot = A op B At the rising edge of CK: Rd=ALUout PC = PC+4 PC next PC = current PC+4current PC next inst.Previous inst. current instruction PCWrite

24 A multi cycle CPU capable of R-type & instructions fetch 5 IR[20:16]=Rt Reg File Instruction Memory PC ALU ck 5 5 IR[25:21]=Rs Rd IR ck ALUout ck A B ALU 4

25 Fetch WBR ALU Decode R-type The state diagram of a CPU capable of R-type instructions only IR=M(PC) PC = PC+4 ALUout=A op B A=Rs B=Rt Rd = ALUout

26 Fetch WBR Load ALU AdrCmp Decode WB lw R-type lw The state diagram of a CPU capable of R-type and lw instructions ALUout= A+sext(imm) MDR = M(ALUout) Rt = MDR

27 We added registers to “split” the instruction to 5 stages. Let’s discuss the lw instruction 5 [25:21]=Rs 5 [20:16]=Rt Reg File Instruction Memory PCALU Adde r 4 ck 16 [15:0] 5 Sext 16->32 Data Memory Rd Address D.In D. Out IR ck A B ALUoutMDR PCWrite

28 First we draw a multi-cycle CPU capable of R-type & lw instructions: 5 IR[20:16]=Rt Reg File Instruction Memory PC ALU 4 ck 16 IR[15:0] 5 Sext 16->32 5 IR[25:21]=Rs Rd IR ck MDR ck ALUout ck A B ALU We just moved the data memoryAll parts related to lw only are blue Data Memory

29 A multi-cycle CPU capable of R-type & lw instructions fetch 5 IR[20:16]=Rt Reg File Instruction Memory PC ALU 4 ck 16 IR[15:0] 5 Sext 16->32 5 IR[25:21]=Rs Rd IR ck MDR ck ALUout ck A B ALU Data Memory

30 A multi-cycle CPU capable of R-type & lw instructions decode 5 IR[20:16]=Rt Reg File Instruction Memory PC ALU 4 ck 16 IR[15:0] 5 Sext 16->32 5 IR[25:21]=Rs Rd IR ck MDE ck ALUout ck A B << 2 Data Memory

31 A multi-cycle CPU capable of R-type & lw instructions AdrCmp 5 IR[20:16]=Rt Reg File Instruction Memory PC ALU 4 ck 16 IR[15:0] 5 Sext 16->32 5 IR[25:21]=Rs Rd IR ck MDR ck ALUout ck A B ALU Data Memory

32 A multi-cycle CPU capable of R-type & lw instructions memory 5 IR[20:16]=Rt Reg File Instruction Memory PC ALU 4 ck 16 IR[15:0] 5 Sext 16->32 5 IR[25:21]=Rs Rd Branch Address IR ck MDR ck ALUout ck A B << 2 Data Memory

33 A multi-cycle CPU capable of R-type & lw instructions WB 5 IR[20:16]=Rt Reg File Instruction Memory PC ALU 4 ck 16 IR[15:0] 5 Sext 16->32 5 IR[25:21]=Rs Rd IR ck MDR ck ALUout ck A B Data Memory ck Rt

34 Can we unite the Instruction & Data memories? (They are not used simultaneously as in the single cycle CPU) 5 IR[20:16]=Rt Reg File Instruction Memory PC ALU 4 ck 16 IR[15:0] 5 Sext 16->32 5 IR[25:21]=Rs Rd IR ck MDR ck ALUout ck A B Data Memory ck

35 So here is a multi-cycle CPU capable of R-type & lw instructions using a single memory for instructions & data 5 IR[20:16]=Rt Reg File PC ALU 4 ck 16 IR[15:0] 5 Sext 16->32 5 IR[25:21]=Rs Rd IR ck MDR ck ALUout ck A B Instruction & data Memory

36 PC D. Mem data D.Mem adrs 0x Rs, RtALU inputs ALU output (address) Memory output fetch Write backdecode execute Mem data memory I.Mem data PC IR A,B ALUout Mem data MDR fetch Write back decode execute memory Timing of a lw instruction in a single cycle CPU Timing of a lw instruction in a multi-cycle CPU PC+4 Previous inst. current instruction Data address Data to Rt

37 Mem data IR A,B ALUout Mem data MDR fetch Write back decode execute memory GPR outputs ALU output IR=M ( PC ) PC= PC+4 A= Rs, B= Rt ALUuot= A+sext(imm) MDR=M(ALUout) At the rising edge of CK: Rt=MDR PC Previous inst. current instruction Data address Data to Rt PCWrite, IRWrite

38 Fetch WBR Load ALU AdrCmp Decode WB lw R-type The state diagram of a CPU capable of R-type and lw instructions ALUout= A+sext(imm) MDR = M(ALUout) Rt = MDR IR=M(PC) PC = PC+4 ALUout=A op B A=Rs B=Rt Rd = ALUout

39 A multi-cycle CPU capable of R-type & lw & sw instructions 5 IR[20:16]=Rt Reg File Instruction & data Memory PC ALU 4 ck 16 IR[15:0] 5 Sext 16->32 5 IR[25:21]=Rs Rd Branch Address IR ck MDR ck ALUout ck A B << 2 lw sw

40 Fetch WBR Load ALU AdrCmp Store Decode WB lw+sw R-type swlw The state diagram of a CPU capable of R-type and lw and sw instructions M(ALUout)=B IR=M(PC) PC = PC+4 ALUout=A op B A=Rs B=Rt Rd = ALUout ALUout= A+sext(imm) MDR = M(ALUout) Rt = MDR

41 A multi-cycle CPU capable of R-type & lw/sw & branch instructions 5 IR[20:16]=Rt Reg File Instruction & data Memory PC ALU 4 ck 16 IR[15:0] 5 Sext 16->32 5 IR[25:21]=Rs Rd IR ck IR ck ALUout ck A B << 2

42 Calc PC=PC+sext(imm)<<2 Adding the instruction beq to the state diagram: Calc Rs -Rt (just to produce the zero signal) Fetch WBR Load Branch ALU AdrCmp Store Decode WB lw+sw R-type beq zero swlw not zero

43 Adding the instruction beq to the state diagram, a more efficient way: Let’s use the decode state in which the ALU is doing nothing to compute the branch address. We’ll have to store it for 1 more CK cycle, until we know whether to branch or not! (We store it in the ALUout reg.) Fetch WBR Load Branch ALU AdrCmp Store Decode WB lw+sw R-type beq swlw Calc ALUout=PC+sext(imm)<<2 Calc Rs - Rt. If zero, load the PC with ALUout data, else do not load the PC

44 A multi-cycle CPU capable of R-type & lw/sw & branch instructions 5 IR[20:16]=Rt Reg File Instruction & data Memory PC ALU 4 ck 16 IR[15:0] 5 Sext 16->32 5 IR[25:21]=Rs Rd Branch Address IR ck IR ck ALUout ck A B <<2 PC+4

45 Fetch Jump WBR Load Branch ALU AdrCmp Store Decode WB lw+sw R-type beq j swlw Adding the instruction j to the state diagram: PC = PC[31:28] || IR[25:0]<<2

46 A multi-cycle CPU capable of R-type & lw/sw & branch & jump instructions 5 IR[20:16]=Rt Reg File Instruction & data Memory PC ALU 4 ck 16 IR[15:0] 5 Sext 16->32 5 IR[25:21]=Rs Rd Branch Address IR ck IR ck ALUout ck A B <<2 PC+4= next address Jump address IR[25:0] <<2 + PC[31:28]

47 סיכום שלבי הפקודות השונות

48 MultiCycle implementation with Control

Final State Machine

50 Fetch Jump WBR Load Branch ALU AdrCmp Store Decode WB lw+sw R-type beq j swlw The final state diagram:

51

52 MultiCycle implementation with Control

53 Implementation: Finite State Machine for Control (The book’s version)

54 Opcode= IR[31:26] zero, neg, etc. next state current state control signalsnext state calculation Outputs decoder State reg ck The Control Finite State Machine: For 10 states coded 0-9, we need 4 bits, i.e., [S3,S2,S1,S0]

55 The control signals decoder We just implement the table of slide 54: Let’s look at ALUSrcA: it is “0” in states 0 and 1 and it is “1” in states 2, 6 and 8. In all other states we don’t care. let’s look at PCWrite: it is “1” in states 0 and 9. In all other states it must be “0”. And so, we’ll fill the table below and build the decoder.

56 The state machine “next state calc.” logic R-type=000000, lw=100011, sw=101011, beq=000100, bne=000101, lui=001111, j= , jal=000011, addi= Fetch 0 Jump 9 WBR 7 Load 3 Branch 8 ALU 6 AdrCmp 2 Store 5 Decode 1 WB 4 lw+sw R-type beq j swlw IR31IR30IR29IR28IR27IR26 opcode S3S2S1S0 current state S3S2S1S0 next state X0XXXXX X X1 0X XXX XXX X XXXXX R-type lw sw lw+sw

57 Opcode = IR[31:26] next state current state control signalsnext state calculation Outputs decoder State reg ck The Control Finite State Machine: Meally machine PCWrite PCWriteCond zero Moore machine to PC

58 Finite State Machine for Control

59 ROM = "Read Only Memory" –values of memory locations are fixed ahead of time A ROM can be used to implement a truth table –if the address is m-bits, we can address 2 m entries in the ROM. –our outputs are the bits of data that the address points to. m is the "heigth", and n is the "width" ROM Implementation mn

60 How many inputs are there? 6 bits for opcode, 4 bits for state = 10 address lines (i.e., 2 10 = 1024 different addresses) How many outputs are there? 16 datapath-control outputs, 4 state bits = 20 outputs ROM is 2 10 x 20 = 20K bits (and a rather unusual size) Rather wasteful, since for lots of the entries, the outputs are the same — i.e., opcode is often ignored ROM Implementation

61 Break up the table into two parts — 4 state bits tell you the 16 outputs, 2 4 x 16 bits of ROM — 10 bits tell you the 4 next state bits, 2 10 x 4 bits of ROM — Total: 4.3K bits of ROM PLA is much smaller — can share product terms — only need entries that produce an active output — can take into account don't cares Size is (#inputs  #product-terms) + (#outputs  #product-terms) For this example = (10x17)+(20x17) = 460 PLA cells PLA cells usually about the size of a ROM cell (slightly bigger) ROM vs PLA

62 Microprogramming

63 Microprogramming What are the “microinstructions” ?

64 A specification methodology –appropriate if hundreds of opcodes, modes, cycles, etc. –signals specified symbolically using microinstructions Will two implementations of the same architecture have the same microcode? What would a microassembler do? Microprogramming

65 Details

66 Microinstruction format

67 Microcode: Trade-offs Distinction between specification and implementation is sometimes blurred Specification Advantages: –Easy to design and write –Design architecture and microcode in parallel Implementation (off-chip ROM) Advantages –Easy to change since values are in memory –Can emulate other architectures –Can make use of internal registers Implementation Disadvantages, SLOWER now that: –Control is implemented on same chip as processor –ROM is no longer faster than RAM –No need to go back and make changes

68 Interrupt and exception Type of event From Where ? MIPS terminology Interrupt External I/O device request Invoke Operation system Internal Exception From user program Arithmetic Overflow Internal Exception Using an undefined Instruction Internal Exception Hardware malfunctions Either Exception or interrupt

69 Exceptions handling Exception typeException vector address (in hex) Undefined instruction c Arithmetic Overflow c We have 2 ways to handle exceptions: Cause register or Vectored interrupts MIPS – Cause register

70 Handling exceptions

71 Handling exceptions

72

73 Fetch Jump WBR Load Branch ALU AdrCmp Store Decode WB lw+sw R-type be q j swsw lw SavePC 10 IRET 1 JumpInt 11 Handling interrupts:

74 End of multi-cycle implementation