designKilla: The 32-bit pipelined processor Brought to you by: Victoria Farthing Dat Huynh Jerry Felker Tony Chen Supervisor: Young Cho.

Slides:



Advertisements
Similar presentations
Adding the Jump Instruction
Advertisements

Instructor: Yuzhuang Hu Final Exam! The final exam is scheduled on 7 th, August, Friday 7:00 pm – 10:00 pm.
1 ITCS 3181 Logic and Computer Systems B. Wilkinson Slides9.ppt Modification date: March 30, 2015 Processor Design.
Topics covered: CPU Architecture CSE 243: Introduction to Computer Architecture and Hardware/Software Interface.
Processor Technology and Architecture
1  1998 Morgan Kaufmann Publishers Chapter Five The Processor: Datapath and Control.
Midterm Wednesday Chapter 1-3: Number /character representation and conversion Number arithmetic Combinational logic elements and design (DeMorgan’s Law)
Computer ArchitectureFall 2007 © October 3rd, 2007 Majd F. Sakr CS-447– Computer Architecture.
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
The Processor 2 Andreas Klappenecker CPSC321 Computer Architecture.
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
Processor I CPSC 321 Andreas Klappenecker. Midterm 1 Thursday, October 7, during the regular class time Covers all material up to that point History MIPS.
S. Barua – CPSC 440 CHAPTER 5 THE PROCESSOR: DATAPATH AND CONTROL Goals – Understand how the various.
The Processor Data Path & Control Chapter 5 Part 1 - Introduction and Single Clock Cycle Design N. Guydosh 2/29/04.
(6.1) Central Processing Unit Architecture  Architecture overview  Machine organization – von Neumann  Speeding up CPU operations – multiple registers.
Processor Structure & Operations of an Accumulator Machine
Basics and Architectures
What have mr aldred’s dirty clothes got to do with the cpu
COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
32-bit Pipelined RISC Processor Group 1 aka “Go Us” Alice Wang Ann Ho Jason Fong CS m152b TA: Young Cho Lab section 1.
Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.
32-bit Pipelined RISC Processor Group 1 aka “Go Us” Alice Wang Ann Ho Jason Fong CS m152b TA: Young Cho Lab section 1.
Lecture 8: Processors, Introduction EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014,
Module : Algorithmic state machines. Machine language Machine language is built up from discrete statements or instructions. On the processing architecture,
Lec 15Systems Architecture1 Systems Architecture Lecture 15: A Simple Implementation of MIPS Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some.
1 COMP541 Multicycle MIPS Montek Singh Apr 4, 2012.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
ECE 445 – Computer Organization
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /19/2013 Lecture 17: The Processor - Overview Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER.
Computer Architecture Souad MEDDEB
Team DataPath Research Computer Architechture. PC and IF in the Processor.
1 A single-cycle MIPS processor  An instruction set architecture is an interface that defines the hardware operations which are available to software.
CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]
COMP541 Multicycle MIPS Montek Singh Mar 25, 2010.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Control Unit.
Processor Types And Instruction sets Chapter- 5.
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
Simple ALU How to perform this C language integer operation in the computer C=A+B; ? The arithmetic/logic unit (ALU) of a processor performs integer arithmetic.
Elements of Datapath for the fetch and increment The first element we need: a memory unit to store the instructions of a program and supply instructions.
COM181 Computer Hardware Lecture 6: The MIPs CPU.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Control Unit.
MIPS Processor.
May 22, 2000Systems Architecture I1 Systems Architecture I (CS ) Lecture 14: A Simple Implementation of MIPS * Jeremy R. Johnson Mon. May 17, 2000.
PROGRAMMABLE LOGIC CONTROLLERS SINGLE CHIP COMPUTER
Control Unit Lecture 6.
Edexcel GCSE Computer Science Topic 15 - The Processor (CPU)
Morgan Kaufmann Publishers
/ Computer Architecture and Design
Morgan Kaufmann Publishers The Processor
Morgan Kaufmann Publishers
Processor Architecture: Introduction to RISC Datapath (MIPS and Nios II) CSCE 230.
Morgan Kaufmann Publishers The Processor
Single Clock Datapath With Control
Processor (I).
Design of the Control Unit for Single-Cycle Instruction Execution
CSCI206 - Computer Organization & Programming
Design of the Control Unit for One-cycle Instruction Execution
Morgan Kaufmann Publishers Computer Organization and Assembly Language
Systems Architecture II
Rocky K. C. Chang 6 November 2017
Systems Architecture I
Processor: Multi-Cycle Datapath & Control
Computer Architecture
Morgan Kaufmann Publishers The Processor
Chapter 7 Microarchitecture
Chapter 7 Microarchitecture
CPU Structure CPU must:
Presentation transcript:

designKilla: The 32-bit pipelined processor Brought to you by: Victoria Farthing Dat Huynh Jerry Felker Tony Chen Supervisor: Young Cho

32-Bit RISC Pipelined Processor Reduced Instruction Set allows for faster execution of simple, frequently used instructions which can be combined to achieve the same result as a single, slower CISC instruction Pipelining allows a faster clock cycle and less wasted resources

Datapath Pipeline Stages 5 Stages – Instruction Fetch – Instruction Decode – Execution – Memory Write – Write Back

Unique Data Path Features Next instruction address calculation –For basic incrementation, the address is calculated by a counter

Address Jump Calculations –For address jumps, there is a 19-bit load port on the counter The loaded address comes from an adder with multiplexed inputs Load bit is controlled by a comparator (beq) or-ed with the absolute jump control bit

Double Clocked Memory Interface Problem:Problem: One Memory for both Instruction and Data Solution:Solution: Double Clock! twiceAccess the memory twice during one clock cycle

Fast Clock Clock Fetch Instruction Fetch DataFetch Instruction Write Enable Write Data Fetches Instruction in First Cycle Fetches or Writes Data In Second Cycle Data is output by end of Clock Cycle

Unique Data Path Features Structural Multiplier –16 X 16 bit –Multi-level creation: Four 8 X 8 bit multipliers –Each containing four 4 X 4 bit multipliers Each comprised of a cascaded network of full and half adders, built on logic gates

16-Bit Multiplier Unit Based On Hand Multiplication Made Up of Network of AND Gates and Adders

Why 32  16 bit? 32bit x 32bit = 64 bits! Multiple complex changes to existing architecture would be required Only one register can be written per clock cycle –Could hold value for next cycle or stall the pipeline Would require pseudoinstruction as well as new hardware and multiple control signals

Use pseudo-code instruction mult32 mult 20, 2, 4 mult 21, 4, 1 mult 22, 2, 3 mult 23, 1, 3 and 24, 20, 30 srli 24, 24, 16 and 25, 21, 31 add 25, 24, 25 and 24, 22, 31 add 25, 25, 24 and 5, 25, 31 srli 5, 5, 16 and 20, 20, 31 or 5, 5, 20 srli 25, 25, 16 and 24, 22, 30 srli 24, 24, 16 add 24, 24, 25 and 25, 23, 31 add 24, 24, 25 and 22, 24, 30 srli 22, 22, 16 and 21, 23, 30 srli 21, 21, 16 add 6, 21, 22 slli 6, 6, 16 and 24, 24, 31 or 6, 6, 24

Improve the Multiplier Can decrease the latency of a combinational multiplier with carry-look ahead adding methods. –Small amount of extra hardware needed, worth it if multiplier has largest latency.

Other Multiplier Topologies Shifting multiplication –Shift multiplicand several times based on multiplier bits –Add intermediate shifted values

Other Multiplier Topologies Pipelined multiplication –Store intermediate sums –Allows for faster clock cycle if traditional combinational multiplication presents the critical path

Other Multiplier Topologies Pipelined multiplication –Sequential multiplication Useful to minimize hardware waste if multiplication is an infrequent operation Continues to allow for faster clock cycle if traditional combinational multiplication presents the critical path

Instruction Set Architecture R-Type

I-Type J-Type

Converts assembly code to binary representation The Assembler Add $3,$1,$2 => => High => Low 16-bit wide memory modules Split into high and low bits for output

Allows for labels to be used in loops Automatically calculates offsets based on label position LABEL: add $1,$2,$3 jmp LABEL Resolves hazards created by pipelining 1.Automatically determines the appropriate number of NO-OPS to insert based on relative position of consecutive instructions Assembler Features

Design allows for pseudo-instructions to be used Pseudo Instruction HLT Actual Instructions H1:JMP H1 NOP

Topic 2 Design – Compiler Bison - Parser A compiler compiler A grammar generator Flex – Lexer A Fast lexical analyzer Tool used in pattern matching on text

CompilingThe C Language Interface Lexer and Parser Lex will feed tokens to Bison (YACC) A grammar tree is generated

Source code to run-time

A simple program A simple C program void main ( void ) { int b ; int d; int x; int y = 3; int g; x = b + d; g = y + x; } Assembly Code Equivalent lwi 4, 0, 3 add 6, 1, 2 sw 3, 6, 0 add 6, 4, 3 sw 5, 6, 0 Memory High Memory Low Machine Code Instructions

Could Use a Little Work Currently the Processor could use a little work to improve performance. –Decreased memory latency would be largest and most direct improvement to processor. –Must optimize ALU as well as multiplier unit. –All in all, will work but not ready for commercial usage.

References Computer Organization and Design: The Hardware Software Interface (2 nd Ed) Patterson, David A. and Hennessy, John L. Morgan Kaufman Publishers, 1997 Introduction to Compilers Aaby, Anthony A., 1998 The Compiler Design Handbook Srikant, Y. N. and Shankar, Priti CRC Press, 2002

THE END Questions?