15-447 Computer ArchitectureFall 2007 © October 3rd, 2007 Majd F. Sakr CS-447– Computer Architecture.

Slides:



Advertisements
Similar presentations
331 W08.1Spring :332:331 Computer Architecture and Assembly Language Spring 2006 Week 8: Datapath Design [Adapted from Dave Patterson’s UCB CS152.
Advertisements

CS-447– Computer Architecture Lecture 12 Multiple Cycle Datapath
The Processor: Datapath & Control
1  1998 Morgan Kaufmann Publishers Chapter Five The Processor: Datapath and Control.
Chapter 5 The Processor: Datapath and Control Basic MIPS Architecture Homework 2 due October 28 th. Project Designs due October 28 th. Project Reports.
Lec 17 Nov 2 Chapter 4 – CPU design data path design control logic design single-cycle CPU performance limitations of single cycle CPU multi-cycle CPU.
331 Lec 14.1Fall 2002 Review: Abstract Implementation View  Split memory (Harvard) model - single cycle operation  Simplified to contain only the instructions:
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Computer Structure - Datapath and Control Goal: Design a Datapath  We will design the datapath of a processor that includes a subset of the MIPS instruction.
The Processor 2 Andreas Klappenecker CPSC321 Computer Architecture.
Chapter Five The Processor: Datapath and Control.
Computer ArchitectureFall 2007 © October 31, CS-447– Computer Architecture M,W 10-11:20am Lecture 17 Review.
Shift Instructions (1/4)
Processor I CPSC 321 Andreas Klappenecker. Midterm 1 Thursday, October 7, during the regular class time Covers all material up to that point History MIPS.
S. Barua – CPSC 440 CHAPTER 5 THE PROCESSOR: DATAPATH AND CONTROL Goals – Understand how the various.
The Processor: Datapath & Control. Implementing Instructions Simplified instruction set memory-reference instructions: lw, sw arithmetic-logical instructions:
Chapter 4 Sections 4.1 – 4.4 Appendix D.1 and D.2 Dr. Iyad F. Jafar Basic MIPS Architecture: Single-Cycle Datapath and Control.
COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections
Processor: Datapath and Control
Lec 15Systems Architecture1 Systems Architecture Lecture 15: A Simple Implementation of MIPS Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some.
1 COMP541 Multicycle MIPS Montek Singh Apr 4, 2012.
COMP541 Multicycle MIPS Montek Singh Apr 8, 2015.
Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per.
ECE 445 – Computer Organization
CDA 3101 Fall 2013 Introduction to Computer Organization
CS2100 Computer Organisation The Processor: Datapath (AY2015/6) Semester 1.
Computer Architecture and Design – ECEN 350 Part 6 [Some slides adapted from A. Sprintson, M. Irwin, D. Paterson and others]
W.S Computer System Design Lecture 4 Wannarat Suntiamorntut.
Datapath and Control Unit Design
1 A single-cycle MIPS processor  An instruction set architecture is an interface that defines the hardware operations which are available to software.
1. Building A CPU  We’ve built a small ALU l Add, Subtract, SLT, And, Or l Could figure out Multiply and Divide  What about the rest l How do.
COMP541 Multicycle MIPS Montek Singh Mar 25, 2010.
D ATA P ATH OF A PROCESSOR (MIPS) Module 1.1 : Elements of computer system UNIT 1.
By Wannarat Computer System Design Lecture 4 Wannarat Suntiamorntut.
ECE-C355 Computer Structures Winter 2008 The MIPS Datapath Slides have been adapted from Prof. Mary Jane Irwin ( )
Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.
Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Single-Cycle Datapath and Control.
February 22, 2016©2003 Craig Zilles (derived from slides by Howard Huang) 1 A single-cycle MIPS processor  As previously discussed, an instruction set.
Elements of Datapath for the fetch and increment The first element we need: a memory unit to store the instructions of a program and supply instructions.
Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.
COM181 Computer Hardware Lecture 6: The MIPs CPU.
MIPS Processor.
May 22, 2000Systems Architecture I1 Systems Architecture I (CS ) Lecture 14: A Simple Implementation of MIPS * Jeremy R. Johnson Mon. May 17, 2000.
Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.
Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:
CS161 – Design and Architecture of Computer Systems
Morgan Kaufmann Publishers
Introduction CPU performance factors
Morgan Kaufmann Publishers
Processor (I).
CS/COE0447 Computer Organization & Assembly Language
Design of the Control Unit for Single-Cycle Instruction Execution
CSCI206 - Computer Organization & Programming
CS/COE0447 Computer Organization & Assembly Language
Single-Cycle CPU DataPath.
CS/COE0447 Computer Organization & Assembly Language
Design of the Control Unit for One-cycle Instruction Execution
MIPS Processor.
Rocky K. C. Chang 6 November 2017
The Processor Lecture 3.2: Building a Datapath with Control
Vishwani D. Agrawal James J. Danaher Professor
Systems Architecture I
COMS 361 Computer Organization
COSC 2021: Computer Organization Instructor: Dr. Amir Asif
CS/COE0447 Computer Organization & Assembly Language
The Processor: Datapath & Control.
COMS 361 Computer Organization
MIPS Processor.
Processor: Datapath and Control
CS/COE0447 Computer Organization & Assembly Language
Presentation transcript:

Computer ArchitectureFall 2007 © October 3rd, 2007 Majd F. Sakr CS-447– Computer Architecture M,W 10-11:20am Lecture 11 Single Cycle Datapath

Computer ArchitectureFall 2007 © Lecture Objectives ° Learn what a datapath is, and how does it provide the required functions. ° Appreciate why different implementation strategies affects the clock rate and CPI of a machine. ° Understand how the ISA determines many aspects of the hardware implementation.

Computer ArchitectureFall 2007 © Implementation vs. Performance Performance of a processor is determined by Instruction count of a program CPI Clock cycle time (clock rate) The compiler & the ISA determine the instruction count. The implementation of the processor determines the CPI and the clock cycle time.

Computer ArchitectureFall 2007 © Possible Execution Steps of Any Instructions ° Instruction Fetch ° Instruction Decode and Register Fetch ° Execution of the Memory Reference Instruction ° Execution of Arithmetic-Logical operations ° Branch Instruction ° Jump Instruction

Computer ArchitectureFall 2007 © Instruction Processing °Five steps: Instruction fetch (IF) Instruction decode and operand fetch (ID) ALU/execute (EX) Memory (not required) (MEM) Write-back (WB) IF ID EX MEM WB

Computer ArchitectureFall 2007 © Datapath & Control Control

Computer ArchitectureFall 2007 © Datapath Elements The data path contains 2 types of logic elements: Combinational: (e.g. ALU) Elements that operate on data values. Their outputs depend on their inputs. State: (e.g. Registers & Memory) Elements with internal storage. Their state is defined by the values they contain.

Computer ArchitectureFall 2007 © State Elements

Computer ArchitectureFall 2007 © Pentium Processor Die °State Registers Memory °Control ROM °Combinational logic (Compute) REG

Computer ArchitectureFall 2007 © Abstract View of the Datapath

Computer ArchitectureFall 2007 © Single Cycle Implementation °This simple processor can compute ALU instructions, access memory or compute the next instruction's address in a single cycle.

Computer ArchitectureFall 2007 © Program Counter If each instruction needs 4 memory locations then, Next PC <= PC + 4

Computer ArchitectureFall 2007 © PC Datapath – Branch Offset PC <= PC + Branch Offset

Computer ArchitectureFall 2007 © Abstract View After PC Basic Implementation

Computer ArchitectureFall 2007 © The Register File °Arithmetic & Logical instructions (R-type), read the contents of 2 registers, perform an ALU operation, and write the result back to a register. °Registers are stored in the register file. The register file has inputs to specify the registers, outputs for the data read, input for the data written and 1 control signal to decide if data should be written in. In addition we will need an ALU to perform the operations.

Computer ArchitectureFall 2007 © The Register File

Computer ArchitectureFall 2007 © R-Type Instructions Assembly (e.g., register-register signed addition) ADD rd reg rs reg rt reg Machine encoding Semantics if MEM[PC] == ADD rd rs rt GPR[rd] ← GPR[rs] + GPR[rt] PC ← PC + 4

Computer ArchitectureFall 2007 © ADD rd rs rt

Computer ArchitectureFall 2007 © Datapath for Add

Computer ArchitectureFall 2007 © I-Type ALU Instructions °Assembly (e.g., register-immediate signed additions) ADDI rt reg rs reg immediate 16 °Machine encoding °Semantics if MEM[PC] == ADDI rt rs immediate GPR[rt] ← GPR[rs] + sign-extend (immediate) PC ← PC + 4

Computer ArchitectureFall 2007 © ADDI rt reg rs reg immediate16

Computer ArchitectureFall 2007 © Datapath for R and I-Type ALU Instructions

Computer ArchitectureFall 2007 © Data Memory °The element needed to implement load and store instructions are data memory. In addition we use the existing ALU to compute the address to access. °The data memory has 2 x-bit inputs: the address and the write data, and 1 x-output: the read data. In addition it has 2 control lines: MemWrite and MemRead.

Computer ArchitectureFall 2007 © Data Memory

Computer ArchitectureFall 2007 © Load Instruction °Assembly (e.g., load 4-byte word) LW rt reg offset 16 (base reg ) °Machine encoding °Semantics if MEM[PC]==LW rt offset16 (base) EA = sign-extend(offset) + GPR[base] GPR[rt] ← MEM[ translate(EA) ] PC ← PC + 4

Computer ArchitectureFall 2007 © LW Datapath

Computer ArchitectureFall 2007 © Branch Equal °The beq (branch if equal) instruction has 3 operands two registers that are compared for equality and a n-bit offset used to compute the branch address relative to the PC.

Computer ArchitectureFall 2007 © Branch Equal

Computer ArchitectureFall 2007 © Unconditional Jump °Assembly J immediate 26 °Machine encoding °Semantics if MEM[PC]==J immediate26 target = { PC[31:28], immediate26, 2’b00 } PC ← target

Computer ArchitectureFall 2007 © Unconditional Jump Datapath

Computer ArchitectureFall 2007 © Combining ALU and Memory Instructions °The ALU datapath and the Memory datapath are similar. The differences are: The second input to the ALU is a register (R- type) or the offset (I-type). The value stored into the destination register comes from the ALU (R-type) or from memory (I-type). °Using 2 multiplexers (Mux) we can combine both datapaths.

Computer ArchitectureFall 2007 © Combining ALU and Memory Instructions

Computer ArchitectureFall 2007 © The Complete Datapath

Computer ArchitectureFall 2007 © Complete Datapath

Computer ArchitectureFall 2007 © What’s Wrong with Single Cycle? °All instructions run at the speed of the slowest instruction. °Adding a long instruction can hurt performance What if you wanted to include multiply? °You cannot reuse any parts of the processor We have 3 different adders to calculate PC+1, PC+1+offset and the ALU °No profit in making the common case fast Since every instruction runs at the slowest instruction speed -This is particularly important for loads as we will see later

Computer ArchitectureFall 2007 © What’s Wrong with Single Cycle? 1 ns – Register read/write time 2 ns – ALU/adder 2 ns – memory access 0 ns – MUX, PC access, sign extend, ROM add: 2ns + 1ns + 2ns + 1ns = 6 ns beq: 2ns + 1ns + 2ns = 5 ns sw: 2ns + 1ns + 2ns + 2ns = 7 ns lw: 2ns + 1ns + 2ns + 2ns + 1ns = 8 ns Get read ALU mem write Instr reg operation reg

Computer ArchitectureFall 2007 © Computing Execution Time Assume: 100 instructions executed 25% of instructions are loads, 10% of instructions are stores, 45% of instructions are adds, and 20% of instructions are branches. Single-cycle execution: 100 * 8ns = 800 ns Optimal execution: 25*8ns + 10*7ns + 45*6ns + 20*5ns = 640 ns