Morgan Kaufmann Publishers The Processor

Slides:

Advertisements

Similar presentations

Adding the Jump Instruction

Advertisements

Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.

CMPT 334 Computer Organization

Pipelined Datapath and Control (Lecture #13) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.

The Processor: Datapath & Control

Computer Structure - Datapath and Control Goal: Design a Datapath  We will design the datapath of a processor that includes a subset of the MIPS instruction.

The Processor 2 Andreas Klappenecker CPSC321 Computer Architecture.

COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections

Computer Organization CS224 Fall 2012 Lesson 26. Summary of Control Signals addsuborilwswbeqj RegDst ALUSrc MemtoReg RegWrite MemWrite Branch Jump ExtOp.

Chapter 4 CSF 2009 The processor: Building the datapath.

Lecture 8: Processors, Introduction EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014,

Lecture 14: Processors CS 2011 Fall 2014, Dr. Rozier.

Lec 15Systems Architecture1 Systems Architecture Lecture 15: A Simple Implementation of MIPS Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some.

Computer Organization CS224 Fall 2012 Lesson 28. Pipelining Analogy  Pipelined laundry: overlapping execution l Parallelism improves performance §4.5.

Morgan Kaufmann Publishers

Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.

Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2012 Revised from original slides provided by MKP.

CPS3340 COMPUTER ARCHITECTURE Fall Semester, /19/2013 Lecture 17: The Processor - Overview Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER.

CDA 3101 Fall 2013 Introduction to Computer Organization

Elements of Datapath for the fetch and increment The first element we need: a memory unit to store the instructions of a program and supply instructions.

Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.

Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:

CS161 – Design and Architecture of Computer Systems

Single Cycle CPU.

Computer Organization

Single-Cycle Datapath and Control

Computer Architecture

Morgan Kaufmann Publishers

Morgan Kaufmann Publishers The Processor

Performance of Single-cycle Design

Introduction CPU performance factors

/ Computer Architecture and Design

Morgan Kaufmann Publishers The Processor

Morgan Kaufmann Publishers

Processor Architecture: Introduction to RISC Datapath (MIPS and Nios II) CSCE 230.

Single Clock Datapath With Control

Pipeline Implementation (4.6)

Morgan Kaufmann Publishers The Processor

Design of the Control Unit for Single-Cycle Instruction Execution

Morgan Kaufmann Publishers The Processor

MIPS processor continued

Morgan Kaufmann Publishers The Processor

Morgan Kaufmann Publishers The Processor

Morgan Kaufmann Publishers The Processor

Chapter 4 The Processor Part 2

COMPUTER DESIGN CENG-212 Dr. Ceyhun ÇELİK

CSCI206 - Computer Organization & Programming

Single-Cycle CPU DataPath.

Design of the Control Unit for One-cycle Instruction Execution

CSCI206 - Computer Organization & Programming

Serial versus Pipelined Execution

A Multiple Clock Cycle Instruction Implementation

Systems Architecture II

Topic 5: Processor Architecture Implementation Methodology

Rocky K. C. Chang 6 November 2017

Composing the Elements

The Processor Lecture 3.4: Pipelining Datapath and Control

Composing the Elements

Architecture Overview

The Processor Lecture 3.2: Building a Datapath with Control

Topic 5: Processor Architecture

COSC 2021: Computer Organization Instructor: Dr. Amir Asif

Lecture 14: Single Cycle MIPS Processor

Processor: Multi-Cycle Datapath & Control

Single Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

Morgan Kaufmann Publishers The Processor

MIPS processor continued

The Processor: Datapath & Control.

Processor: Datapath and Control

Presentation transcript:

Morgan Kaufmann Publishers The Processor 11 September, 2018 Chapter 4 The Processor Chapter 4 — The Processor

Morgan Kaufmann Publishers ALU Control 11 September, 2018 Load/Store (LDUR/STUR): ALU computes the memory address by addition R-type instructions: ALU performs one of the four actions (AND, OR, subtract, or add), depending on the value of the 11-bit opcode field in the instruction compare and branch zero (CBZ): ALU just passes the register input value. Small control unit Input: opcode field of the instruction and a 2-bit control field, called ALUOp, with the following values: (00) indicates the operation to be performed should be add for loads and stores, (01) pass input b for CBZ, (10) determined by the operation encoded in the opcode field. Output: 4-bit signal that directly controls the ALU by generating one of the 6 combinations shown below §4.4 A Simple Implementation Scheme ALU control lines Function 0000 AND 0001 OR 0010 add 0110 subtract 0111 pass input b 1100 NOR Chapter 4 — The Processor

Morgan Kaufmann Publishers ALU Control 11 September, 2018 ALU control inputs based on the 2-bit ALUOp control and the 11-bit opcode. ALUOp bits are generated from the main control unit. Multiple levels of decoding - common implementation technique can reduce the size of the main control unit potentially reduce the latency of the control unit opcode ALUOp Operation Opcode field ALU function ALU control LDUR 00 load register XXXXXXXXXXX add 0010 STUR store register CBZ 01 compare and branch on zero pass input b 0111 R-type 10 100000 subtract 100010 0110 AND 100100 0000 ORR 100101 OR 0001 Chapter 4 — The Processor

Morgan Kaufmann Publishers The Main Control Unit 11 September, 2018 Control signals derived from instruction Opcode field: 6 – 11 bits wide, bit positions 31:26 to 31:21 First register operand: bit positions 9:5 (Rn) Other register operand: bit positions 20:16 (Rm), 4:0 (Rt) Another operand: 19-bit offset (CBZ) or 9-bit offset (Load/Store) The destination register for R-type instructions (Rd) and for loads (Rt) is in bit positions 4:0. Chapter 4 — The Processor

Datapath with Multiplexors and Control Lines

Control Signals

Datapath with control unit and control signals Morgan Kaufmann Publishers 11 September, 2018 Datapath with control unit and control signals Chapter 4 — The Processor

Setting Control Signals The setting of the control lines depends only on the opcode, The table shows whether each control signal should be 0, 1, or don’t care (X) for each of the opcode values

Morgan Kaufmann Publishers 11 September, 2018 R-Type Instruction ADD X1,X2,X3 Four steps to execute the instruction The instruction is fetched, and the PC is incremented Two registers, X2 and X3, are read from the register file; also, the main control unit computes the setting of the control lines during this step. The ALU operates on the data read from the register file, using portions of the opcode to generate the ALU function The result from the ALU is written into the destination register (X1) in the register file. Chapter 4 — The Processor

Morgan Kaufmann Publishers 11 September, 2018 Load Instruction LDUR X1, [X2, offset] Five steps to execute the instruction An instruction is fetched from the instruction memory, and the PC is incremented. A register (X2) value is read from the register file. The ALU computes the sum of the value read from the register file and the sign-extended 9 bits of the instruction (offset). The sum from the ALU is used as the address for the data memory. The data from the memory unit is written into the register file (X1). Chapter 4 — The Processor

Morgan Kaufmann Publishers 11 September, 2018 CBZ Instruction CBZ X1, offset Five steps to execute the instruction An instruction is fetched from the instruction memory, and the PC is incremented. The register, X1 is read from the register file using bits 4:0 of the instruction (Rt). The ALU passes the data value read from the register file. The value of PC is added to the sign-extended, 19 bits of the instruction (offset) are shifted left by two; the result is the branch target address. The Zero status information from the ALU is used to decide which adder result to store in the PC. Chapter 4 — The Processor

Control Function for the simple single-cycle implementation The outputs of the control function are the control lines, and the input is the opcode field

Implementing Unconditional Branch Morgan Kaufmann Publishers 11 September, 2018 Implementing Unconditional Branch 2 address 31:26 25:0 Jump Jump uses word address Update PC with concatenation of Top 4 bits of old PC 26-bit jump address 00 Need an extra control signal decoded from opcode Chapter 4 — The Processor

Morgan Kaufmann Publishers 11 September, 2018 Datapath With B Added Implement a branch by storing into the PC sum of the PC and the sign extended and shifted 26-bit offset. An additional OR-gate is used with a control signal to select the branch target PC always. Chapter 4 — The Processor

Morgan Kaufmann Publishers 11 September, 2018 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory  register file  ALU  data memory  register file Not feasible to vary period for different instructions Violates design principle Making the common case fast We will improve performance by pipelining Chapter 4 — The Processor

Morgan Kaufmann Publishers Pipelining Analogy 11 September, 2018 Pipelined laundry: overlapping execution Parallelism improves performance Pipelining improves throughput of our laundry system. When many loads of laundry to do, the improvement in throughput decreases the total time to complete the work §4.5 An Overview of Pipelining Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup = 2n/0.5n + 1.5 ≈ 4 = number of stages Chapter 4 — The Processor

Morgan Kaufmann Publishers 11 September, 2018 LEGv8 Pipeline Five stages, one step per stage IF: Instruction fetch from memory ID: Instruction decode & register read EX: Execute operation or calculate address MEM: Access memory operand WB: Write result back to register Chapter 4 — The Processor

Morgan Kaufmann Publishers Pipeline Performance 11 September, 2018 Assume time for stages is 100ps for register read or write 200ps for other stages Compare pipelined datapath with single-cycle datapath The single-cycle design must allow for the slowest instruction—it is LDUR—so the time required for every instruction is 800 ps. Instr Instr fetch Register read ALU op Memory access Register write Total time LDUR 200ps 100 ps 800ps STUR 700ps R-format (ADD, SUB, AND, ORR) 600ps CBZ 500ps Chapter 4 — The Processor

Morgan Kaufmann Publishers Pipeline Performance 11 September, 2018 Single-cycle (Tc= 800ps) All the pipeline stages take a single clock cycle, so the clock cycle must be long enough to accommodate the slowest operation worst-case clock cycle of 200 ps Pipelined (Tc= 200ps) Chapter 4 — The Processor

Morgan Kaufmann Publishers 11 September, 2018 Pipeline Speedup If all stages are balanced i.e., all take the same time Time between instructionspipelined = Time between instructionsnonpipelined Number of stages If not balanced, speedup is less Speedup due to increased throughput Latency (time for each instruction) does not decrease Pipelining improves performance by increasing instruction throughput, in contrast to decreasing the execution time of an individual instruction. Instruction throughput is the important metric because real programs execute billions of instructions. Chapter 4 — The Processor

Pipelining and ISA Design Morgan Kaufmann Publishers 11 September, 2018 Pipelining and ISA Design LEGv8 ISA designed for pipelining All instructions are 32-bits Easier to fetch and decode in one cycle c.f. x86: 1- to 15-byte instructions Few and regular instruction formats Can decode and read registers in one step Load/store addressing Can calculate address in 3rd stage, access memory in 4th stage Alignment of memory operands Memory access takes only one cycle Chapter 4 — The Processor