CS-447– Computer Architecture Lecture 12 Multiple Cycle Datapath

Slides:

Advertisements

Similar presentations

Adding the Jump Instruction

Advertisements

Multicycle Datapath & Control Andreas Klappenecker CPSC321 Computer Architecture.

1  1998 Morgan Kaufmann Publishers We will be reusing functional units –ALU used to compute address and to increment PC –Memory used for instruction and.

1 Chapter Five The Processor: Datapath and Control.

L14 – Control & Execution 1 Comp 411 – Fall /04/09 Control & Execution Finite State Machines for Control MIPS Execution.

Chapter 5 The Processor: Datapath and Control Basic MIPS Architecture Homework 2 due October 28 th. Project Designs due October 28 th. Project Reports.

10/26/2004Comp 120 Fall October Only 11 to go! Questions? Today Exam Review and Instruction Execution.

Fall 2007 MIPS Datapath (Single Cycle and Multi-Cycle)

1 Chapter Five. 2 We're ready to look at an implementation of the MIPS Simplified to contain only: –memory-reference instructions: lw, sw –arithmetic-logical.

331 Lec18.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Lecture 18 Introduction to Pipelined Datapath [Adapted from Dave.

Computer ArchitectureFall 2007 © October 3rd, 2007 Majd F. Sakr CS-447– Computer Architecture.

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

10/18/2005Comp 120 Fall October Questions? Instruction Execution.

The Processor 2 Andreas Klappenecker CPSC321 Computer Architecture.

Lecture 16: Basic CPU Design

L15 – Control & Execution 1 Comp 411 – Spring /25/08 Control & Execution Finite State Machines for Control MIPS Execution.

331 W10.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 10 Building a Multi-Cycle Datapath [Adapted from Dave Patterson’s.

1  1998 Morgan Kaufmann Publishers We're ready to look at an implementation of the MIPS Simplified to contain only: –memory-reference instructions: lw,

CPU Architecture Why not single cycle? Why not single cycle? Hardware complexity Hardware complexity Why not pipelined? Why not pipelined? Time constraints.

Computer ArchitectureFall 2008 © October 6th, 2008 Majd F. Sakr CS-447– Computer Architecture.

Datapath and Control Andreas Klappenecker CPSC321 Computer Architecture.

Class 9.1 Computer Architecture - HUJI Computer Architecture Class 9 Microprogramming.

1 We're ready to look at an implementation of the MIPS Simplified to contain only: –memory-reference instructions: lw, sw –arithmetic-logical instructions:

Spring W :332:331 Computer Architecture and Assembly Language Spring 2005 Week 11 Introduction to Pipelined Datapath [Adapted from Dave Patterson’s.

The Multicycle Processor CPSC 321 Andreas Klappenecker.

CSE431 L05 Basic MIPS Architecture.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 05: Basic MIPS Architecture Review Mary Jane Irwin.

Lecture 24: CPU Design Today’s topic –Multi-Cycle ALU –Introduction to Pipelining 1.

Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.

Computing Systems The Processor: Datapath and Control.

Chapter 5 Processor Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides We're ready to look at an implementation of the MIPS Simplified.

CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.

CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.

EECS 322: Computer Architecture

CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides

1 CS/COE0447 Computer Organization & Assembly Language Multi-Cycle Execution.

1 Processor: Datapath and Control Single cycle processor –Datapath and Control Multicycle processor –Datapath and Control Microprogramming –Vertical and.

1  2004 Morgan Kaufmann Publishers Chapter Five.

ECE-C355 Computer Structures Winter 2008 The MIPS Datapath Slides have been adapted from Prof. Mary Jane Irwin ( )

Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.

Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.

Multicycle datapath.

1 CS/COE0447 Computer Organization & Assembly Language Chapter 5 Part 3.

Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.

1. 2 MIPS Hardware Implementation Full die photograph of the MIPS R2000 RISC Microprocessor. The 1986 MIPS R2000 with five pipeline stages and 450,000.

1 CS/COE0447 Computer Organization & Assembly Language Chapter 5 Part 3.

1 CS/COE0447 Computer Organization & Assembly Language Chapter 5 Part 3 In-Class Exercises.

CS161 – Design and Architecture of Computer Systems

Control & Execution Finite State Machines for Control MIPS Execution.

Systems Architecture I

Five Execution Steps Instruction Fetch

MIPS Instructions.

CS/COE0447 Computer Organization & Assembly Language

Multiple Cycle Implementation of MIPS-Lite CPU

Chapter Five The Processor: Datapath and Control

Multicycle Approach Break up the instructions into steps

The Multicycle Implementation

CS/COE0447 Computer Organization & Assembly Language

Chapter Five The Processor: Datapath and Control

The Multicycle Implementation

Systems Architecture I

Multicycle Approach We will be reusing functional units

COSC 2021: Computer Organization Instructor: Dr. Amir Asif

Processor (II).

Processor: Multi-Cycle Datapath & Control

Multicycle Design.

Multi-Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

Systems Architecture I

CS161 – Design and Architecture of Computer Systems

Presentation transcript:

CS-447– Computer Architecture Lecture 12 Multiple Cycle Datapath September 24, 2008 Nael Abu-Ghazaleh naelag@cmu.edu www.qatar.cmu.edu/~msakr/15447-f08/ Greet class

Implementation vs. Performance Performance of a processor is determined by Instruction count of a program CPI Clock cycle time (clock rate) The compiler & the ISA determine the instruction count. The implementation of the processor determines the CPI and the clock cycle time.

Possible Execution Steps of Any Instructions Instruction Fetch Instruction Decode and Register Fetch Execution of the Memory Reference Instruction Execution of Arithmetic-Logical operations Branch Instruction Jump Instruction

Instruction Processing Five steps: Instruction fetch (IF) Instruction decode and operand fetch (ID) ALU/execute (EX) Memory (not required) (MEM) Write-back (WB) WB IF EX ID MEM

Single Cycle Implementation

Multiple ALUs and Memory Units

Single Cycle Datapath

What’s Wrong with Single Cycle? All instructions run at the speed of the slowest instruction. Adding a long instruction can hurt performance What if you wanted to include multiply? You cannot reuse any parts of the processor We have 3 different adders to calculate PC+4, PC+4+offset and the ALU No profit in making the common case fast Since every instruction runs at the slowest instruction speed This is particularly important for loads as we will see later

What’s Wrong with Single Cycle? 1 ns – Register read/write time 2 ns – ALU/adder 2 ns – memory access 0 ns – MUX, PC access, sign extend, ROM add: 2ns + 1ns + 2ns + 1ns = 6 ns beq: 2ns + 1ns + 2ns = 5 ns sw: 2ns + 1ns + 2ns + 2ns = 7 ns lw: 2ns + 1ns + 2ns + 2ns + 1ns = 8 ns Get read ALU mem write Instr reg operation reg

Computing Execution Time Assume: 100 instructions executed 25% of instructions are loads, 10% of instructions are stores, 45% of instructions are adds, and 20% of instructions are branches. Single-cycle execution: 100 * 8ns = 800 ns Optimal execution: 25*8ns + 10*7ns + 45*6ns + 20*5ns = 640 ns

Single Cycle Problems A sequence of instructions: LW (IF, ID, EX, MEM, WB) SW (IF, ID, EX, MEM) etc Single Cycle Implementation: Cycle 1 Cycle 2 Clk Load Store Waste what if we had a more complicated instruction like floating point? wasteful of area

Multiple Cycle Solution use a “smaller” cycle time have different instructions take different numbers of cycles a “multicycle” datapath:

Multicycle Approach We will be reusing functional units ALU used to compute address and to increment PC Memory used for instruction and data We’ll use a finite state machine for control

The Five Stages of an Instruction Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 IF ID Ex Mem WB IF: Instruction Fetch and Update PC ID: Instruction Decode and Registers Fetch Ex: Execute R-type; calculate memory address Mem: Read/write the data from/to the Data Memory WB: Write the result data into the register file As shown here, each of these five steps will take one clock cycle to complete.

Multicycle Implementation Break up the instructions into steps, each step takes a cycle balance the amount of work to be done restrict each cycle to use only one major functional unit At the end of a cycle store values for use in later cycles (easiest thing to do) introduce additional “internal” registers

The Five Stages of Load Instruction Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 lw IF ID Ex Mem WB IF: Instruction Fetch and Update PC ID: Instruction Decode and Registers Fetch Ex: Execute R-type; calculate memory address Mem: Read/write the data from/to the Data Memory WB: Write the result data into the register file As shown here, each of these five steps will take one clock cycle to complete.

Multiple Cycle Implementation Break the instruction execution into Clock Cycles Different instructions require a different number of clock cycles Clock cycle is limited by the slowest stage Instruction latency is not reduced (time from the start of an instruction to its completion) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 lw IFetch Dec Exec Mem WB Latency = execution time (delay or response time) – the total time from start to finish of ONE instruction For processors one important measure is THROUGHPUT (or the execution bandwidth) – the total amount of work done in a given amount of time For memories one important measure is BANDWIDTH – the amount of information communicated across an interconnect (e.g., bus) per unit time; the number of operations performed per second (the WIDTH of the operation and the RATE of the operation) IFetch Dec Exec Mem WB sw

Single Cycle vs. Multiple Cycle Single Cycle Implementation: Cycle 1 Cycle 2 Clk Load Store Waste Multiple Cycle Implementation: Here are the timing diagrams showing the differences between the single cycle, multiple cycle, and pipeline implementations. For example, in the pipeline implementation, we can finish executing the Load, Store, and R-type instruction sequence in seven cycles. Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk lw sw R-type IFetch Dec Exec Mem WB IFetch Dec Exec Mem IFetch

Multicycle Implementation Break up the instructions into steps, each step takes a cycle balance the amount of work to be done restrict each cycle to use only one major functional unit At the end of a cycle store values for use in later cycles (easiest thing to do) introduce additional “internal” registers

Instructions from ISA perspective Consider each instruction from perspective of ISA. Example: The add instruction changes a register. Register specified by bits 15:11 of instruction. Instruction specified by the PC. New value is the sum (“op”) of two registers. Registers specified by bits 25:21 and 20:16 of the instruction Reg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]] op Reg[Memory[PC][20:16]] In order to accomplish this we must break up the instruction. (kind of like introducing variables when programming)

Breaking down an instruction ISA definition of arithmetic: Reg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]] op Reg[Memory[PC][20:16]] Could break down to: IR <= Memory[PC] A <= Reg[IR[25:21]] B <= Reg[IR[20:16]] ALUOut <= A op B Reg[IR[20:16]] <= ALUOut We forgot an important part of the definition of arithmetic! PC <= PC + 4

Idea behind multicycle approach We define each instruction from the ISA perspective (do this!) Break it down into steps following our rule that data flows through at most one major functional unit (e.g., balance work across steps) Introduce new registers as needed (e.g, A, B, ALUOut, MDR, etc.) Finally try and pack as much work into each step (avoid unnecessary cycles) while also trying to share steps where possible (minimizes control, helps to simplify solution)

Five Execution Steps Instruction Fetch Instruction Decode and Register Fetch Execution, Memory Address Computation, or Branch Completion Memory Access or R-type instruction completion Write-back step INSTRUCTIONS TAKE FROM 3 - 5 CYCLES!

Step 1: Instruction Fetch Use PC to get instruction and put it in the Instruction Register. Increment the PC by 4 and put the result back in the PC. Can be described succinctly using RTL "Register-Transfer Language" IR <= Memory[PC]; PC <= PC + 4; Can we figure out the values of the control signals? What is the advantage of updating the PC now?

Step 2: Instruction Decode and Register Fetch Read registers rs and rt in case we need them Compute the branch address in case the instruction is a branch RTL: A <= Reg[IR[25:21]]; B <= Reg[IR[20:16]]; ALUOut <= PC + (sign-extend(IR[15:0]) << 2); We aren't setting any control lines based on the instruction type (we are busy "decoding" it in our control logic)

Step 3 (instruction dependent) ALU is performing one of three functions, based on instruction type Memory Reference: ALUOut <= A + sign-extend(IR[15:0]); R-type: ALUOut <= A op B; Branch: if (A==B) PC <= ALUOut;

Step 4 (R-type or memory-access) Loads and stores access memory MDR <= Memory[ALUOut]; or Memory[ALUOut] <= B; R-type instructions finish Reg[IR[15:11]] <= ALUOut;

Write-back step Reg[IR[20:16]] <= MDR; Which instruction needs this?

Summary:

Multiple Cycle Implementation

Review: finite state machines a set of states and next state function (determined by current state and the input) output function (determined by current state and possibly input) We’ll use a Moore machine (output based only on current state)

Implementing the Control Value of control signals is dependent upon: what instruction is being executed which step is being performed Use the information we’ve accumulated to specify a finite state machine specify the finite state machine graphically, or use microprogramming Implementation can be derived from specification

Graphical Specification of FSM

Finite State Machine for Control Implementation: