Give qualifications of instructors: DAP

Slides:

Advertisements

Similar presentations

Designing a Multicycle Processor

Advertisements

EEM 486 EEM 486: Computer Architecture Lecture 4 Designing a Multicycle Processor.

ELEN 350 Multi-Cycle Datapath Adapted from the lecture notes of John Kubiatowicz (UCB) and Hank Walker (TAMU)

CS152 Lec9.1 CS152 Computer Architecture and Engineering Lecture 9 Designing Single Cycle Control.

CS-447– Computer Architecture Lecture 12 Multiple Cycle Datapath

EECC550 - Shaaban #1 Lec # 4 Summer Major CPU Design Steps 1Using independent RTN, write the micro- operations required for all target ISA.

361 datapath Computer Architecture Lecture 8: Designing a Single Cycle Datapath.

EECC550 - Shaaban #1 Lec # 5 Winter Major CPU Design Steps 1. Analyze instruction set operations using independent RTN ISA => RTN => datapath.

CS61C L26 Single Cycle CPU Datapath II (1) Garcia © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine.

CS61C L26 CPU Design : Designing a Single-Cycle CPU II (1) Garcia, Spring 2007 © UCB 3.6 TB DVDs? Maybe!  Researchers at Harvard have found a way to use.

361 multipath..1 ECE 361 Computer Architecture Lecture 10: Designing a Multiple Cycle Processor.

EECC550 - Shaaban #1 Lec # 5 Winter CPU Design Steps 1. Analyze instruction set operations using independent ISA => RTN => datapath requirements.

Savio Chau Single Cycle Controller Design Last Time: Discussed the Designing of a Single Cycle Datapath Control Datapath Memory Processor (CPU) Input Output.

EECC550 - Shaaban #1 Lec # 5 Winter CPU Design Steps 1. Analyze instruction set operations using independent RTN => datapath requirements.

EECC550 - Shaaban #1 Lec # 5 Winter CPU Design Steps 1. Analyze instruction set operations using independent RTN => datapath requirements.

CS152 / Kubiatowicz Lec9.1 9/28/01©UCB Fall 2001 CS 152 Computer Architecture and Engineering Lecture 9 Designing a Multicycle Processor February 15, 2001.

EECC550 - Shaaban #1 Lec # 4 Winter CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional.

ECE 232 L15.Miulticycle.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 15 Multi-cycle.

Microprocessor Design

Computer ArchitectureFall 2007 © October 3rd, 2007 Majd F. Sakr CS-447– Computer Architecture.

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

ECE 232 L13. Control.1 ©UCB, DAP’ 97 ECE 232 Hardware Organization and Design Lecture 13 Control Design

Recap: Processor Design is a Process

CS 61C L17 Control (1) A Carle, Summer 2006 © UCB inst.eecs.berkeley.edu/~cs61c/su06 CS61C : Machine Structures Lecture #17: CPU Design II – Control

CS61C L26 CPU Design : Designing a Single-Cycle CPU II (1) Garcia, Fall 2006 © UCB Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.

EECC550 - Shaaban #1 Lec # 4 Winter Major CPU Design Steps 1Using independent RTN, write the micro- operations required for all target.

361 control Computer Architecture Lecture 9: Designing Single Cycle Control.

EECC550 - Shaaban #1 Lec # 5 Winter Major CPU Design Steps 1. Analyze instruction set operations using independent RTN ISA => RTN => datapath.

CS61C L26 CPU Design : Designing a Single-Cycle CPU II (1) Garcia, Spring 2010 © UCB inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures.

EECC550 - Shaaban #1 Lec # 5 Spring CPU Design Steps 1. Analyze instruction set operations using independent RTN => datapath requirements.

ECE 232 L12.Datapath.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 12 Datapath.

Major CPU Design Steps 1. Analyze instruction set operations using independent RTN ISA => RTN => datapath requirements. This provides the the required.

EECC550 - Shaaban #1 Lec # 5 Spring CPU Design Steps 1. Analyze instruction set operations using independent RTN => datapath requirements.

CS61C L27 Single Cycle CPU Control (1) Garcia, Fall 2006 © UCB Wireless High Definition?  Several companies will be working on a “WirelessHD” standard,

Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.

Chapter 4 Sections 4.1 – 4.4 Appendix D.1 and D.2 Dr. Iyad F. Jafar Basic MIPS Architecture: Single-Cycle Datapath and Control.

CS3350B Computer Architecture Winter 2015 Lecture 5.6: Single-Cycle CPU: Datapath Control (Part 1) Marc Moreno Maza [Adapted.

Computer Organization CS224 Fall 2012 Lesson 26. Summary of Control Signals addsuborilwswbeqj RegDst ALUSrc MemtoReg RegWrite MemWrite Branch Jump ExtOp.

ECS154B Computer Architecture Designing a Multicycle Processor Note Set 4

EEM 486: Computer Architecture Designing Single Cycle Control.

5. The Processor: Datapath and Control

Designing a Single Cycle Datapath In this lecture, slides from lectures 3, 8 and 9 from the course Computer Architecture ECE 201 by Professor Mike Schulte.

EEM 486: Computer Architecture Designing a Single Cycle Datapath.

Computer Architecture and Design – ECEN 350 Part 6 [Some slides adapted from A. Sprintson, M. Irwin, D. Paterson and others]

CPE 442 single-cycle datapath.1 Intro. To Computer Architecture CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath.

CS3350B Computer Architecture Winter 2015 Lecture 5.7: Single-Cycle CPU: Datapath Control (Part 2) Marc Moreno Maza [Adapted.

IT 251 Computer Organization and Architecture Multi Cycle CPU Datapath Chia-Chi Teng.

Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Single-Cycle Datapath and Control.

Csci 136 Computer Architecture II –Single-Cycle Datapath Xiuzhen Cheng

EEM 486: Computer Architecture Lecture 3 Designing Single Cycle Control.

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Single-Cycle CPU Datapath & Control Part 2 Instructors: Krste Asanovic & Vladimir Stojanovic.

Single Cycle Controller Design

Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.

Design a MIPS Processor (II)

CS161 – Design and Architecture of Computer Systems

Problem with Single Cycle Processor Design

Systems Architecture I

Designing a Multicycle Processor

CS/COE0447 Computer Organization & Assembly Language

MIPS processor continued

CSCI206 - Computer Organization & Programming

John Kubiatowicz (http.cs.berkeley.edu/~kubitron)

CS152 Computer Architecture and Engineering Lecture 8 Designing a Single Cycle Datapath Start: X:40.

Systems Architecture I

COMS 361 Computer Organization

Multi-Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

Instructors: Randy H. Katz David A. Patterson

Systems Architecture I

COMS 361 Computer Organization

Processor: Datapath and Control

Presentation transcript:

ECE 232 Hardware Organization and Design Lecture 14 Multi-cycle Processor Design Give qualifications of instructors: DAP teaching computer architecture at Berkeley since 1977 Co-athor of textbook used in class Best known for being one of pioneers of RISC currently author of article on future of microprocessors in SciAm Sept 1995 RY took 152 as student, TAed 152,instructor in 152 undergrad and grad work at Berkeley joined NextGen to design fact 80x86 microprocessors one of architects of UltraSPARC fastest SPARC mper shipping this Fall Maciej Ciesielski www.ecs.umass.edu/ece/labs/vlsicad/ece232/spr2002/index_232.html

Why single-cycle is not good enough Design of a multi-cycle processor Outline Review Single-cycle processor design VHDL models of datapath Why single-cycle is not good enough Design of a multi-cycle processor Multi-cycle Datapath Multi-cycle Control Performance analysis credential: bring a computer die photo wafer : This can be an hidden slide. I just want to use this to do my own planning. I have rearranged Culler’s lecture slides slightly and add more slides. This covers everything he covers in his first lecture (and more) but may We will save the fun part, “ Levels of Organization,” at the end (so student can stay awake): I will show the internal stricture of the SS10/20. Notes to Patterson: You may want to edit the slides in your section or add extra slides to taylor your needs.

Recap: Processor Design is a Process Bottom-up assemble components in target technology to establish critical timing Top-down specify component behavior from high-level requirements Iterative refinement establish partial solution, expand and improve datapath control processor Instruction Set Architecture => Reg. File Mux ALU Reg Mem Decoder Sequencer Cells Gates

Recap: A Single Cycle Datapath Datapath with control signals (underline) Instruction Fetch Unit Clk Instruction<31:0> <21:25> <16:20> <11:15> <0:15> nPC_sel Rd Rt 32 ALUctr Clk busW RegWr busA busB 5 Rw Ra Rb 32 32-bit Registers Rs Rt RegDst Extender Mux 16 imm16 ALUSrc ExtOp MemtoReg Data In WrEn Adr Data Memory MemWr ALU Zero 1 Imm16 Rd The result of the last lecture is this single-cycle datapath. +1 = 6 min. (X:46)

Recap: The “Truth Table” for the Main Control op 6 ALU (Local) func 3 ALUop ALUctr RegDst ALUSrc : R-type ori lw sw beq jump RegDst ALUSrc MemtoReg RegWrite MemWrite Branch Jump ExtOp ALUop (Symbolic) 1 x “R-type” Or Add Subtract xxx op 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010 ALUop <2> ALUop <1> ALUop <0> Now that we have taken care of the Local Control (ALU Control), let’s refocus our attention to the Main Controller. The job of the Main Control is to look at the Opcode field of the instruction and generate these control signals for the datapath (RegDst, ... ExtOp) as well as the 3-bit ALUop field for the ALU Control. Here, I have shown you the symbolic value of the ALUop field as well as the actual bit assignment. For example here (2nd column), the R-type ALUop is encode as 100 and the Add operation (3rd column) is encoded as 000.. This is call a quote “Truth Table” unquote because if you think about it, this is like having the truth table rotates 90 degrees. Let me show you what I mean by that. +3 = 65 min. (Y:45)

Recap: PLA Implementation of the Main Control op<0> op<5> . <0> R-type ori lw sw beq jump RegWrite ALUSrc MemtoReg MemWrite Branch Jump RegDst ExtOp ALUop<2> ALUop<1> ALUop<0> Similarly, for ALUSrc, we need to OR the ori, load, and store terms together because we need to assert the ALUSrc signals whenever we have the Ori, load, or store instructions. The RegDst, MemtoReg, MemWrite, Branch, and Jump signals are very simple. They don’t need to OR any product terms together because each is asserted for only one instruction. For example, RegDst is asserted ONLY for R-type instruction and MemtoReg is asserted ONLY for load instruction. ExtOp, on the other hand, needs to be set to 1 for both the load and store instructions so the immediate field is sign extended properly. Therefore, we need to OR the load and store terms together to form the signal ExtOp. Finally, we have the ALUop signals. But clever encoding of the ALUop field, we are able to keep them simple so that no OR gates is needed. If you don’t already know, this regular structure with an array of AND gates followed by another array of OR gates is called a Programmable Logic Array, or PLA for short. It is one of the most common ways to implement logic function and there are a lot of CAD tools available to simplify them. +3 = 70 min. (Y:50)

Recap: Systematic Generation of Control Control Logic / Store (PLA, ROM) OPcode Datapath Instruction Decode Conditions Control Points microinstruction In our single-cycle processor, each instruction is realized by exactly one control command or “microinstruction” in general, the controller is a Finite State Machine microinstruction can also control sequencing (see later)

The Big Picture: Where are We Now? The Five Classic Components of a Computer Today’s topic: designing the datapath for the multiple clock cycle datapath Processor Input Control Memory Datapath Output So where are in in the overall scheme of things. Well, we just finished designing the processor’s datapath. Now I am going to show you how to design the control for the datapath. +1 = 7 min. (X:47)

Behavioral models of Datapath Components entity adder16 is generic (ccOut_delay : TIME := 12 ns; adderOut_delay: TIME := 12 ns); port(A, B: in std_logic_vector (15 downto 0); DOUT: out std_logic_vector (15 downto 0); CIN: in bit; COUT: out bit); end adder16; Attention: Altera VHDL simulation software does not support delay architecture behavior of adder32 is begin adder16_process: process(A, B, CIN) variable tmp : std_logic_vector (18 downto 0); variable adder_out : std_logic_vector (31 downto 0); variable carry : bit; tmp := addum (addum (A, B), CIN); adder_out := tmp(15 downto 0); carry :=tmp(16); COUT <= carry after ccOut_delay; DOUT <= adder_out after adderOut_delay; end process; end behavior; 16 A B DOUT Cin Cout

Behavioral Specification of Control Logic entity maincontrol is port(opcode: in std_logic_vector d(5 downto 0); equal_cond: in bit; extop out bit; ALUsrc out bit; ALUop out std_logic_vector d(1 downto 0); MEMwr out bit; MemtoReg out bit; RegWr out bit; RegDst out bit; nPC out bit; end maincontrol; Decode / Control-store address modeled by Case statement Each arm drives control signals for that operation just like the microinstruction either can be symbolic

Abstract View of our Single Cycle Processor PC Next PC Register Fetch ALU Reg. Wrt Mem Access Data Instruction Result Store ALUctr RegDst ALUSrc ExtOp MemWr Equal nPC_sel RegWr MemRd Main Control control op fun Ext Looks like an FSM with PC as state

What’s wrong with our CPI=1 processor? Arithmetic & Logical PC Reg File Inst Memory mux ALU setup Load PC Inst Memory mux ALU Data Mem Reg File setup Critical Path Store PC Inst Memory mux ALU Data Mem Reg File Branch PC Inst Memory cmp mux Reg File Long cycle time All instructions take as much time as the slowest Real memory is not so nice as our idealized memory cannot always get the job done in one (short) cycle

Memory Access Time Physics => fast memories are small (large memories are slow) question: register file vs. memory => Use a hierarchy of memories Storage Array selected word line storage cell address bit line address decoder sense amps mem. bus proc. bus memory L2 Cache Cache Processor 1 cycle 2-3 cycles 20 - 50 cycles

=> Reducing Cycle Time Cut combinational dependency graph and insert register / latch Do same work in two fast cycles, rather than one slow one storage element Acyclic Combinational Logic storage element Acyclic Combinational Logic (A) Logic (B) =>

Basic Limits on Cycle Time Next address logic PC <= branch ? PC + offset : PC + 4 Instruction Fetch InstructionReg <= Mem[PC] Register Access A <= R[rs] ALU operation R <= A + B PC Next PC Operand Fetch Exec Reg. File Mem Access Data Instruction Result Store ALUctr RegDst ALUSrc ExtOp MemWr nPC_sel RegWr MemRd Control

Partitioning the CPI=1 Datapath Add registers between smallest steps PC Next PC Operand Fetch Exec Reg. File Mem Access Data Instruction Result Store ALUctr RegDst ALUSrc ExtOp MemWr nPC_sel RegWr MemRd

Example Multicycle Datapath MemToReg MemRd MemWr RegDst RegWr nPC_sel ALUSrc ALUctr ExtOp Equal Reg. File Ext ALU Reg File A R PC IR Next PC B Mem Access M Result Store Data Mem Execute; comp. mem address Instruction Fetch Operand Fetch Memory access Critical Path ?

Disadvantages of the Single Cycle Processor Summary Disadvantages of the Single Cycle Processor Long cycle time Cycle time is too long for all instructions except the Load Multiple Cycle Processor: Divide the instructions into smaller steps Execute each step (instead of the entire instruction) in one cycle Partition datapath into equal size chunks to minimize cycle time ~10 levels of logic between latches Follow same 5-step method for designing “real” processor