Multicycle Implementation

Slides:



Advertisements
Similar presentations
Multicycle Datapath & Control Andreas Klappenecker CPSC321 Computer Architecture.
Advertisements

1 Chapter Five The Processor: Datapath and Control.
CS-447– Computer Architecture Lecture 12 Multiple Cycle Datapath
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 14 - Multi-Cycle.
The Processor Data Path & Control Chapter 5 Part 2 - Multi-Clock Cycle Design N. Guydosh 2/29/04.
Chapter 5 The Processor: Datapath and Control Basic MIPS Architecture Homework 2 due October 28 th. Project Designs due October 28 th. Project Reports.
CSE378 Multicycle impl,.1 Drawbacks of single cycle implementation All instructions take the same time although –some instructions are longer than others;
1 5.5 A Multicycle Implementation A single memory unit is used for both instructions and data. There is a single ALU, rather than an ALU and two adders.
Week 11Spring :332:331 Computer Architecture and Assembly Language Spring 2006 Week 11: Microprogramming [Adapted from Dave Patterson’s UCB CS152.
Embedded Systems in Silicon TD5102 MIPS design Datapath and Control Henk Corporaal Technical University.
Preparation for Midterm Binary Data Storage (integer, char, float pt) and Operations, Logic, Flip Flops, Switch Debouncing, Timing, Synchronous / Asynchronous.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Advanced Computer Architecture 5MD00 / 5Z033 MIPS Design data path and control Henk Corporaal TUEindhoven 2007.
331 W10.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 10 Building a Multi-Cycle Datapath [Adapted from Dave Patterson’s.
CPU Architecture Why not single cycle? Why not single cycle? Hardware complexity Hardware complexity Why not pipelined? Why not pipelined? Time constraints.
1 The Processor: Datapath and Control We will design a microprocessor that includes a subset of the MIPS instruction set: –Memory access: load/store word.
Class 9.1 Computer Architecture - HUJI Computer Architecture Class 9 Microprogramming.
The Multicycle Processor CPSC 321 Andreas Klappenecker.
Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.
Computing Systems The Processor: Datapath and Control.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Multi-Cycle Processor.
Computer Architecture Chapter 5 Fall 2005 Department of Computer Science Kent State University.
Chapter 5 Processor Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides We're ready to look at an implementation of the MIPS Simplified.
CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides
Computer Organization Lecture Set – 05.2 Chapter 5 Huei-Yung Lin.
1 CS/COE0447 Computer Organization & Assembly Language Multi-Cycle Execution.
C HAPTER 5 T HE PROCESSOR : D ATAPATH AND C ONTROL M ULTICYCLE D ESIGN.
Gary MarsdenSlide 1University of Cape Town Stages.
1 A single-cycle MIPS processor  An instruction set architecture is an interface that defines the hardware operations which are available to software.
1 Processor: Datapath and Control Single cycle processor –Datapath and Control Multicycle processor –Datapath and Control Microprogramming –Vertical and.
LECTURE 6 Multi-Cycle Datapath and Control. SINGLE-CYCLE IMPLEMENTATION As we’ve seen, single-cycle implementation, although easy to implement, could.
ECE-C355 Computer Structures Winter 2008 The MIPS Datapath Slides have been adapted from Prof. Mary Jane Irwin ( )
Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.
CDA 3101 Spring 2016 Introduction to Computer Organization Microprogramming and Exceptions 08 March 2016.
1 CS/COE0447 Computer Organization & Assembly Language Chapter 5 Part 3.
Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.
1 CS/COE0447 Computer Organization & Assembly Language Chapter 5 Part 3.
Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:
Design a MIPS Processor (II)
Multi-Cycle Datapath and Control
CS161 – Design and Architecture of Computer Systems
IT 251 Computer Organization and Architecture
Chapter 5 The Processor: Datapath and Control
/ Computer Architecture and Design
Systems Architecture I
Multi-Cycle CPU.
Extensions to the Multicycle CPU
D.4 Finite State Diagram for the Multi-cycle processor
Multi-Cycle CPU.
CS/COE0447 Computer Organization & Assembly Language
Multiple Cycle Implementation of MIPS-Lite CPU
CSCE 212 Chapter 5 The Processor: Datapath and Control
Processor: Finite State Machine & Microprogramming
Chapter Five The Processor: Datapath and Control
Multicycle Approach Break up the instructions into steps
The Multicycle Implementation
Chapter Five The Processor: Datapath and Control
Drawbacks of single cycle implementation
The Multicycle Implementation
Systems Architecture I
Vishwani D. Agrawal James J. Danaher Professor
Multicycle Approach We will be reusing functional units
COSC 2021: Computer Organization Instructor: Dr. Amir Asif
Processor: Multi-Cycle Datapath & Control
Multi-Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Chapter Four The Processor: Datapath and Control
5.5 A Multicycle Implementation
Processor Design Datapath and Design.
Systems Architecture I
CS161 – Design and Architecture of Computer Systems
Presentation transcript:

Multicycle Implementation The Processor: Multicycle Implementation

Reminder: Single-Cycle Datapath

Multicycle Approach Break up the instructions into steps, each step takes a cycle balance the amount of work in cycles only one major functional unit is used in each cycle At the end of each cycle store values for use in later cycles introduce additional “internal” registers We will be recycling functional units ALU used to compute branch address and to increment PC The same memory used for instruction and data We will use finite state machine for control

Multicycle Implementation buffer registers between functional units which will be updated every clock cycle IR: Instruction register MDR: Memory Data register A and B registers to hold register operand values read from register file ALUOut: to hold output of ALU

Datapath with Control Signals

Handling PC Multiplexor for selecting the source for the PC ALU output after PC+4 ALU output after branch target address calculation New address with jump instructions (26 bits of IR) Control signals for writing PC Unconditionally after a normal instruction and jump - PCWrite Conditionally overwritten if a branch is taken When PCWriteCond and zero are asserted, the branch address at the output of ALU is written into the PC.

Handling PC PC 1 IorD PC+4, jump address, branch address Address PCWrite PCWriteCond zero PC PC+4, jump address, branch address 1 Address Memory MemData IorD write data from ALUOut

Complete Datapath & Control

Five Execution Steps Instruction Fetch Instruction Decode and Register Fetch Execution, Memory Address Computation, or Branch Completion Memory Access or R-type instruction completion Write-back step INSTRUCTIONS TAKE 3 - 5 CYCLES!

Step 1: Instruction Fetch Use PC to get the instruction and put it in IR. Increment the PC by 4 and put the result back in the PC. Can be described succinctly using RTL IR <= Memory[PC]; PC <= PC + 4; The ALU is used for incrementing PC Control signals MemRead = ? IRWrite = ? PCWrite = ? IorD = ? ALUSrcA = ? ALUSrcB = ?? ALUOp = ?? PCSource = ??

Step 2: Instruction Decode & Register Fetch Read registers rs and rt in case we need them Compute the branch address in case the instruction is a branch A <= Reg[IR[25-21]]; B <= Reg[IR[20-16]]; ALUOut <= PC + (sign-extend(IR[15-0])<<2); We aren't setting any control lines based on the instruction type yet ALUSrcA = ? (PC) ALUSrcB = ?? (sign-extended & shifted offset) ALUOp = ?? (add)

Step 3 (Instruction Dependent) ALU performs for one of the four functions Memory Reference: ALUOut <= A + sign-extend(IR[15-0]); ALUSrcA = ?, ALUSrcB = ??, ALUOp = ?? R-type: ALUOut <= A op B; ALUSrcA = ?, ALUSrcB = ??, ALUOp = ?? Branch: if (A == B) PC <= ALUOut; ALUSrcA = ?, ALUSrcB = ??, ALUOp = ??, PCWriteCond = ?, PCSource = ?? Jump: PC <= PC[31:28]||(IR[25:0]<<2) PCWrite = ?, PCSource = ??

Step 4 (R-type or Memory-Access) Load and store instructions access memory lw: MDR <= Memory[ALUOut]; MemRead = ?, IorD = ? sw: Memory[ALUOut] <= B; MemWrite = ?, IorD = ? R-type instructions finish Reg[IR[15-11]] <= ALUOut; RegDst = ?, RegWrite = ?, MemtoReg = ?

Write-Back Step Memory read completion step Reg[IR[20-16]] <= MDR; RegDst = ?, RegWrite = ?, MemtoReg = ? What about all the other instructions?

Actions for memory-access Summary of the Steps Step Name Actions for R-type Actions for memory-access Action for branch Action for jump Instruction Fetch IR <= Memory[PC]; PC <= PC + 4; Instruction decode / register fetch A <= Reg[IR[25-21]]; B <= Reg[IR[20-16]]; ALUOut <= PC + (sign-extend(IR[15-0]) << 2); Execution, address calculations, branch / jump completion ALUOut <= A op B; ALUOut <= A + sign-extend(IR[15-0]); if (A==B) PC <= ALUOut; PC <= PC[31:28]|| (IR[25:0]<<2) Memory access / R-type completion Reg[IR[15- 11]] <= ALUOut; Load: MDR <= Memory[ALUOut]; Store: Memory[ALUOut] <= B; Memory read completion Reg[IR[20-16]] <= MDR;

Instructions in Multicycle Implementation Loads: 5, Stores: 4, ALU Instructions: 4, Branch: 3, Jump: 3 Example: From SPECINT2000, instruction mix: 25 % loads (1% load byte + 24% load word) 10% stores (1% store byte + 9% store word) 11 % branches (6% beq + 5% bne) 2 % jumps (1% jal + 1% jr) 52% ALU operations CPI = formula = 4.12

Example How many cycles does it take to execute this code? lw $t2, 0($t3) lw $t3, 4($t3) beq $t2, $t3, Label #assume not taken add $t5, $t2, $t3 sw $t5, 8($t3) Label: ... What exactly is happening during the 8th cycle of execution? In what cycle does the actual addition of $t2 and $t3 take place?

Implementing the Control A control unit that will assert control signals at the right time determine the next step Value of control signals is dependent upon: which instruction is being executed which step is being performed (implies sequential logic) Two specification methods State diagrams Microprogramming Implementation can be derived from this specification

Review: Finite State Machines a finite set of states and next state function (determined by current state and input) output function (determined by current state and possibly input) We’ll use a Moore machine (output based only on current state)

Finite State Machines Inputs Mealy Outputs Moore Outputs Next state Output Function Mealy Outputs Next State Function Combinational Network State Elements state Next state clk Output Function Moore Outputs Combinational Network

State Diagram for Fetch & Decode start MemRead ALUSrcA = 0 IorD = 0 IRWrite ALUSrcB = 01 ALUOp = 00 PCWrite PCSource = 00 S0 ALUSrcA = 0 ALUSrcB = 11 ALUOp = 00 S1 op = ‘lw’ or ‘sw’ Memory Reference FSM op = R-type R-type FSM op = ‘beq’ Branch FSM op = ‘j’ Jump FSM

State Diagram for Memory-Access ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 Op = ‘lw’ or ‘sw’ S2 From state S1 memory address calculation memory access MemRead IorD = 1 lw S3 MemWrite IorD = 1 sw S5 memory access RegDst = 0 RegWrite MemtoReg = 1 S4 memory read completion To state S0

State Diagram for R-type ALUSrcA = 1 ALUSrcB = 00 ALUOp = 10 Op = R-type S6 From state S1 Execution RegDst = 1 RegWrite MemtoReg = 0 S7 R-type completion To state S0

State Diagram for Branch ALUSrcA = 1 ALUSrcB = 00 ALUOp = 01 PCWriteCond PCSource = 01 Op = beq Branch completion From state S1 To state S0

State Diagram for Jump From state S1 S9 To state S0 Op = j PCWrite PCSource = 10 Op = j Jump completion From state S1 S9 To state S0

Complete State Diagram MemRead ALUSrcA = 0 IorD = 0 IRWrite ALUSrcB = 01 ALUOp = 00 PCWrite PCSource = 00 S0 ALUSrcA = 0 ALUSrcB = 11 ALUOp = 00 S1 How many state bits are needed? ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 Op = ‘lw’ or ‘sw’ S2 start ALUSrcA = 1 ALUSrcB = 00 ALUOp = 10 Op = R-type S6 ALUSrcA = 1 ALUSrcB = 00 ALUOp = 01 PCWriteCond PCSource = 01 Op = beq S8 PCWrite PCSource = 10 Op = j S9 MemWrite IorD = 1 sw S5 MemRead IorD = 1 lw S3 RegDst = 1 RegWrite MemtoReg = 0 S7 RegDst = 0 RegWrite MemtoReg = 1 S4

Implementing the Control outputs datapath control outputs Combinational control logic inputs state next state inputs from instruction FFs Moore or Mealy machine?

Complete Datapath & Control

Exceptions or Interrupts Unintentional change of normal flow of instructions Exception is an unexpected event from within the processor, e.g arithmetic overflow. An interrupt may come from outside of the processor. These terms are observed to be used interchangeably (IA32 uses interrupts for both) Our simplified MIPS architecture can generate two exceptions: Arithmetic overflow Undefined instruction.

Interrupt or Exception? Type of event From where? MIPS terminology I/O device request External Interrupt Invoke the OS from the user program Internal Exception Arithmetic overflow Undefined instruction Hardware malfunctions Either Exception or interrupt

When an Exception Occurs 1/2 Exception program counter (EPC) PC of the current instruction (i.e. PC-4) in EPC Transfer the control to OS at some specified address. The OS does what needs to be done, it either Stops the execution of the program and reports an error Provides a service to the user program (e.g a predefined action to an overflow) and return to the point of origin of exception using EPC

When an Exception Occurs 2/2 The OS must know the reason for the exception An exception is handled using two approaches: cause register contains the reason for exception (MIPS). Vectored interrupt: For each interrupt type, a separate vector is stored and each interrupt vector targets at the appropriate piece of code.

Vectored Interrupts For each cause of the interrupt, the control goes to a different location in memory to handle the exception Exception type exception vector address (in hex) Undefined instruction 0xC0000000 Arithmetic overflow 0xC0000020

How MIPS Handles Exceptions Registers EPC Cause register: individual bits are used to encode the cause of exceptions (e.g. 0 in the LSB for undefined instruction 1 for overflow) Control Signals CauseWrite EPCWrite IntCause: 1-bit signals to set the appropriate bit of Cause register Exception address: 0x8000 0180 (SPIM uses 0x8000 0080)

MIPS Memory Allocation 0x8000 0180 sp  7fff fffc Stack Dynamic Data Static Data 1000 0000 gp  1000 8000 Text pc  0040 0000 Reserved

The Multicycle Datapath with Exception Handling overflow 0x80000180

What is New? ALU 1 2 3 1 PCSource jump address 0x8000 0180 Zero 1 jump address 2 0x8000 0180 3 Zero EPCWrite OF ALUOut ALU EPC PC-4 IntCause CauseWrite Cause 1 1

To state S0 to begin next instruction Exception States IntCause = 1 CauseWrite ALUSrcA=0 ALUSrcB=01 ALUOp = 01 EPCWrite PCWrite PCSource=11 S11 IntCause = 0 CauseWrite ALUSrcA=0 ALUSrcB=01 ALUOp = 01 EPCWrite PCWrite PCSource=11 S10 To state S0 to begin next instruction

Complete State Diagram with Exceptions S0 S1 start Op = J Op = beq MemRead ALUSrcA = 0 IorD = 0 IRWrite ALUSrcB = 01 ALUOp = 00 PCWrite PCSource = 00 S1 ALUSrcA = 0 ALUSrcB = 11 ALUOp = 00 IntCause = 0 CauseWrite ALUSrcA=0 ALUSrcB=01 ALUOp = 01 EPCWrite PCWrite PCSource=11 Op = other S10 start Op = J Op = R-type Op = beq Op = ‘lw’ or ‘sw’ S2 S9 ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 ALUSrcA = 1 ALUSrcB = 00 ALUOp = 10 ALUSrcA = 1 ALUSrcB = 00 ALUOp = 01 PCWriteCond PCSource = 01 PCWrite PCSource = 10 S6 S8 sw lw S3 S7 S5 MemRead IorD = 1 MemWrite IorD = 1 RegDst = 1 RegWrite MemtoReg = 0 IntCause = 1 CauseWrite ALUSrcA=0 ALUSrcB=01 ALUOp = 01 EPCWrite PCWrite PCSource=11 Overflow S11 S4 RegDst = 0 RegWrite MemtoReg = 1

Hardware/Software Interface ISA Assembly Microarchitecture

Control with Microprogramming Simpler instructions to implement (macro) instructions Microinstructions are kept in ROM addressable Not visible to programmers Microinstruction format ALU Control SRC1 SRC2 Reg. Cont. Memory PC Write Cont. Sequencing Control signals are directly embedded in microinstructions

Fields of Microinstructions Field name Function of field ALU Control Specify the operation being performed by the ALU during this clock, the result is always written in ALUOut SRC1 Specify the source for the first ALU operand SRC2 Specify the source for the second ALU operand Register Control Specify read or write for the register file, and the source of the value for a write Memory Specify read or write, and the source for the memory. For a read specify the destination register PCWrite Control Specify the writing of the PC Sequencing Specify how to choose the next microinstruction

Choosing Next Microinstruction Next instruction Seq (sequential behavior) Branch to the microinstruction that starts execution of the next MIPS instruction Fetch is the label of the initial microinstruction Next microinstruction is chosen based on the input (dispatching, dispatch tables) In MIPS, two dispatch tables: one from S1 and the other from S2. Dispatch i

Microprogram for the Control Unit Sequencing PCWrite control Memory Register control SRC2 SRC1 ALU control Label 1 2 3 4 5 6 7 8 9 Seq ALU Read PC 4 PC Add Fetch Dispatch 1 Read Extshft PC Add

Dispatch Tables Microcode dispatch table 1 Opcode field Opcode name Value 000000 (0) R-format Rformat1 000010 (2) jmp JUMP1 000100 (4) beq BEQ1 100011 (35) lw Mem1 101011 (43) sw Microcode dispatch table 2 Opcode field Opcode name value 100011 (35) lw LW2 101011 (43) sw SW2

Microprogram for the Control Unit Label ALU control SRC1 SRC2 Register control Memory PCWrite control Sequencing 1 2 3 4 5 6 7 8 9 Seq ALU Read PC 4 PC Add Fetch Dispatch 1 Read Extshft PC Add Dispatch 2 Extend A Add Mem1 Seq Read ALU LW2 Fetch Write MDR Fetch Write ALU SW2 Fetch Write ALU Seq B A Func code Rformat1 Fetch ALUOut-cond B A Sub BEQ1 Fetch Jump address JUMP1

Implementing Microprogram Microcode storage datapath control outputs outputs input Sequencing control 1 microprogram counter adder address select logic Dispatch tables