VLIW Digital Signal Processor Michael Chang. Alison Chen. Candace Hobson. Bill Hodges.

Slides:

Advertisements

Similar presentations

SD1020 Number of Members :3 Vidura Manu Wijayasekara Ridhima Agarwal Bhaskar Kumar Advisor : Dr. Sudarshan Srinivasan.

Advertisements

Control Unit Implemntation

Chapter 1. Basic Structure of Computers

Lecture 4: CPU Performance

1/1/ / faculty of Electrical Engineering eindhoven university of technology Speeding it up Part 3: Out-Of-Order and SuperScalar execution dr.ir. A.C. Verschueren.

ARM Cortex A8 Pipeline EE126 Wei Wang. Cortex A8 is a processor core designed by ARM Holdings. Application: Apple A4, Samsung Exynos What’s the.

Final Project : Pipelined Microprocessor Joseph Kim.

Computer Architecture Lecture 2 Abhinav Agarwal Veeramani V.

CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture VLIW Steve Ko Computer Sciences and Engineering University at Buffalo.

1 Advanced Computer Architecture Limits to ILP Lecture 3.

EEM 486 EEM 486: Computer Architecture Lecture 4 Designing a Multicycle Processor.

CPT 310 Logic and Computer Design Instructor: David LublinerPhone Engineering Technology Dept.Cell

Aug. 24, 2007ELEC 5200/6200 Project1 Computer Design Project ELEC 5200/6200-Computer Architecture and Design Fall 2007 Vishwani D. Agrawal James J.Danaher.

Chapter 16 Control Unit Operation No HW problems on this chapter. It is important to understand this material on the architecture of computer control units,

Computer ArchitectureFall 2007 © October 3rd, 2007 Majd F. Sakr CS-447– Computer Architecture.

Chapter 16 Control Unit Implemntation. A Basic Computer Model.

Recap – Our First Computer WR System Bus 8 ALU Carry output A B S C OUT F 8 8 To registers’ input/output and clock inputs Sequence of control signal combinations.

DLX Instruction Format

Chapter 15 IA 64 Architecture Review Predication Predication Registers Speculation Control Data Software Pipelining Prolog, Kernel, & Epilog phases Automatic.

Shift Instructions (1/4)

Henry Hexmoor1 Chapter 10- Control units We introduced the basic structure of a control unit, and translated assembly instructions into a binary representation.

Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University.

Midterm Thursday let the slides be your guide Topics: First Exam - definitely cache,.. Hamming Code External Memory & Buses - Interrupts, DMA & Channels,

Appendix A Pipelining: Basic and Intermediate Concepts

Chapter 6 Memory and Programmable Logic Devices

Pipelining By Toan Nguyen.

Chapter 17 Microprocessor Fundamentals William Kleitz Digital Electronics with VHDL, Quartus® II Version Copyright ©2006 by Pearson Education, Inc. Upper.

DARPA Digital Audio Receiver, Processor and Amplifier Group Z James Cotton Bobak Nazer Ryan Verret.

Micro-operations Are the functional, or atomic, operations of a processor. A single micro-operation generally involves a transfer between registers, transfer.

Basics and Architectures

Levels of Architecture & Language CHAPTER 1 © copyright Bobby Hoggard / material may not be redistributed without permission.

COMPUTER ARCHITECTURE Assoc.Prof. Stasys Maciulevičius Computer Dept.

Computer Design Basics

Lecture 9. MIPS Processor Design – Instruction Fetch Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System Education &

Memory/Storage Architecture Lab Computer Architecture Pipelining Basics.

Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.

Lecture 8: Processors, Introduction EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014,

Lecture 14: Processors CS 2011 Fall 2014, Dr. Rozier.

A Simple Computer Architecture Digital Logic Design Instructor: Kasım Sinan YILDIRIM.

Computer Architecture And Organization UNIT-II General System Architecture.

Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.

Multiple-Cycle Hardwired Control Digital Logic Design Instructor: Kasım Sinan YILDIRIM.

Important Concepts  Parts of the CPU  Arithmetic/Logic Unit  Control Unit  Registers  Program Counter  Instruction Register  Fetch/Decode/Execute.

CDA 3101 Fall 2013 Introduction to Computer Organization

Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.

Computer Organization CDA 3103 Dr. Hassan Foroosh Dept. of Computer Science UCF © Copyright Hassan Foroosh 2002.

ALU (Continued) Computer Architecture (Fall 2006).

Chapter 5: Computer Systems Design and Organization Dr Mohamed Menacer Taibah University

Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Control Unit.

GROUP 2 CHAPTER 16 CONTROL UNIT Group Members ๏ Evelio L. Hernandez ๏ Ashwin Soerdien ๏ Andrew Keiper ๏ Hermes Andino.

PART 4: (1/2) Central Processing Unit (CPU) Basics CHAPTER 12: P ROCESSOR S TRUCTURE AND F UNCTION.

Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Control Unit.

Computer Operation. Binary Codes CPU operates in binary codes Representation of values in binary codes Instructions to CPU in binary codes Addresses in.

BASIC COMPUTER ARCHITECTURE HOW COMPUTER SYSTEMS WORK.

1/29 UTDSP: A VLIW Programmable DSP Processor Sean Hsien-en Peng Department of Electrical and Computer Engineering University of Toronto October 26 th,

Recap – Our First Computer WR System Bus 8 ALU Carry output A B S C OUT F 8 8 To registers’ read/write and clock inputs Sequence of control signal combinations.

STUDY OF PIC MICROCONTROLLERS.. Design Flow C CODE Hex File Assembly Code Compiler Assembler Chip Programming.

CS161 – Design and Architecture of Computer Systems

Morgan Kaufmann Publishers

Morgan Kaufmann Publishers

Morgan Kaufmann Publishers The Processor

Design of the Control Unit for Single-Cycle Instruction Execution

Pipelining: Advanced ILP

Design of the Control Unit for One-cycle Instruction Execution

CSC 4250 Computer Architectures

Introduction to Computer Organization and Architecture

General Purpose Processor : Software

Computer Architecture Assembly Language

Chapter 4 The Von Neumann Model

Presentation transcript:

VLIW Digital Signal Processor Michael Chang. Alison Chen. Candace Hobson. Bill Hodges

Introduction Functionality Functionality ISA ISA Implementation Implementation Functional blocks Functional blocks Circuit analysis Circuit analysis Testing Testing Off Chip Memory Off Chip Memory Status Status

Things to look for Design Tradeoffs Design Tradeoffs Register file size Register file size Multiple word sizes Multiple word sizes Instruction set and implementation Instruction set and implementation Data forwarding Data forwarding Software-controlled on chip cache Software-controlled on chip cache Shared Address/Data bus for off-chip data memory Shared Address/Data bus for off-chip data memory

Instruction Set Architecture 24-bit instruction words pack 3 sub-instructions: 24-bit instruction words pack 3 sub-instructions: Ex: SUB R5 R3, LDM R3 R6, BNEZ R1 Ex: SUB R5 R3, LDM R3 R6, BNEZ R1 Register file - 8 registers Register file - 8 registers 3 bit encoding * 5 Reg. IDs = 15 bits per IW 3 bit encoding * 5 Reg. IDs = 15 bits per IW Simple but useful Instruction Set Simple but useful Instruction Set Multiply, Add/Subtract, Branch, Jump, Load Memory, Load intermediate, Load CCM Multiply, Add/Subtract, Branch, Jump, Load Memory, Load intermediate, Load CCM 2 Branch delay slots 2 Branch delay slots

Microarchitecture In order, 4 stage pipeline In order, 4 stage pipeline IF, ID, EX, WB IF, ID, EX, WB 3 cycle pipeline stage 3 cycle pipeline stage Data forwarding Data forwarding Eliminate RAW hazards (ELEC 320, 425) Eliminate RAW hazards (ELEC 320, 425) 5 forwarding paths 5 forwarding paths Control Logic Control Logic PLA controls pipeline PLA controls pipeline Initialize pipeline, reset Program Counter Initialize pipeline, reset Program Counter Cycle through three cycles of pipeline stage Cycle through three cycles of pipeline stage

Implementation Double Wide Silicon Floorplan

ALU Design Array Multiplier Array Multiplier Ripple-Carry Adder Ripple-Carry Adder Longest Paths: Longest Paths: Add/Subtract: ns through MSB Add/Subtract: ns through MSB Multiply: ns through 10 th product term Multiply: ns through 10 th product term

Compiler Controlled Memory (CCM) Small on chip software controlled cache Small on chip software controlled cache Similar to Commercial DSPs Similar to Commercial DSPs Predictable access time in real time Predictable access time in real time Benefits over off chip memory: Benefits over off chip memory: Double bandwidth Double bandwidth Software configurability Software configurability Reduced register “spill”/ “fill” pressure Reduced register “spill”/ “fill” pressure Easily extendable Easily extendable

Implementation of CCM 4 12-bit lines of memory on chip (8 words) 4 12-bit lines of memory on chip (8 words) Two registers, R6 and R7, for loading and storing Two registers, R6 and R7, for loading and storing Two instructions, LDC and STC Two instructions, LDC and STC 9-bit instruction 9-bit instruction Three bit opcode Three bit opcode Five bit word line Five bit word line Single bit determines single/double access Single bit determines single/double access Example instruction: LDC Example instruction: LDC (Reads CCM Line 1 into R6 and R7)

ORCA Test Vector Generation Process Goal: Greater accuracy and shorter time to verify chip functionality Goal: Greater accuracy and shorter time to verify chip functionality Assembly Code Binary Code IRSIM vectors assembler Vector translator

ORCA Vector Suite Goal: Create functional vectors to isolate specific chip cells to aid in post-silicon debug. Goal: Create functional vectors to isolate specific chip cells to aid in post-silicon debug. Register File Register File Compiler Controlled Memory Compiler Controlled Memory ALU ALU Branch Branch Data Forwarding Data Forwarding Pipeline Pipeline

ORCA Obsbus State Machine Goal: Increase internal test signals to the IO’s by implementing a MUX. The MUX is controlled by output signals generated from a state machine. Goal: Increase internal test signals to the IO’s by implementing a MUX. The MUX is controlled by output signals generated from a state machine. 16:1 MUX, 6 output observability pins, 1 input observability pin 16:1 MUX, 6 output observability pins, 1 input observability pin Allows observation of up to 96 internal signals using 7 pins. Allows observation of up to 96 internal signals using 7 pins. The state machine changes state on each toggle of the input pin. The state machine changes state on each toggle of the input pin.

Obsbus Signals Goal: Track an instruction execution through each of the pipeline stages. Goal: Track an instruction execution through each of the pipeline stages. FetchDecode Write Back Execute Program Counter Branch Address OpCodes RegF output Forwarding signals OpCodes ALU input/output CCM input/output OpCodes RegF inputs

Off Chip Memory Instruction memory Instruction memory Regular static RAM (used previously in 422) Regular static RAM (used previously in 422) 8 bit addressing, 8 bit data reads 8 bit addressing, 8 bit data reads 2 8 = 256 words possible = 85 VLIW instructions 2 8 = 256 words possible = 85 VLIW instructions 70 ns read time 70 ns read time One read every cycle One read every cycle Output address on clock A, latch data on clock B Output address on clock A, latch data on clock B One read/cycle * 8bits * 3 cycles/pipeline state = 24 bit VLIW One read/cycle * 8bits * 3 cycles/pipeline state = 24 bit VLIW

Off Chip Memory, continued Data memory (DS1609) Data memory (DS1609) Shared Address/Data bus Shared Address/Data bus PLA carefully designed to control memory PLA carefully designed to control memory Uses worst case propagation delays Uses worst case propagation delays Timed signals using two out of phase clocks Timed signals using two out of phase clocks Default PLA output latching on clock B Default PLA output latching on clock B External latching on clock A to properly time signals External latching on clock A to properly time signals 50 ns read time 50 ns read time

Current Status Functionality of major blocks tested Functionality of major blocks tested Instruction Fetch in final stages Instruction Fetch in final stages ALU instructions implemented and working, including data forwarding ALU instructions implemented and working, including data forwarding Memory instructions just need to be routed Memory instructions just need to be routed Crystal and HSPICE analysis fifty percent complete Crystal and HSPICE analysis fifty percent complete Global power, clock, and pin routing allocated in floorplan Global power, clock, and pin routing allocated in floorplan

Conclusion Solid fundamental ISA gives a nice “baby DSP” Solid fundamental ISA gives a nice “baby DSP” Modular implementation of fundamental blocks Modular implementation of fundamental blocks Design Decisions are well justified Design Decisions are well justified Register file size Register file size Instruction word length Instruction word length Implementation balances timing and space Implementation balances timing and space Access to off chip memory Access to off chip memory

Questions?