DIGITAL SIGNAL PROCESSING Dr. Hugh Blanton ENTC 4337/ENTC 5337.

Slides:



Advertisements
Similar presentations
DSPs Vs General Purpose Microprocessors
Advertisements

SIGNAL PROCESSING ON THE TMS320C6X VLIW DSP Prof. Brian L. Evans in collaboration with Niranjan Damera-Venkata and Magesh Valliappan Embedded Signal Processing.
PIPELINE AND VECTOR PROCESSING
Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.
Architecture and Instruction Set of the C6x Processor Module 1.
Lecture 6 Programming the TMS320C6x Family of DSPs.
INTRODUCTION TO THE TMS320C6000 VLIW DSP Prof. Brian L. Evans in collaboration with Dr. Niranjan Damera-Venkata and Mr. Magesh Valliappan Embedded Signal.
Survey of Digital Signal Processors Michael Warner ECD: VLSI Communication Systems.
ECSE DSP architecture Review of basic computer architecture concepts C6000 architecture: VLIW Principle and Scheduling Addressing Assembly and linear.
Overview of Popular DSP Architectures: TI, ADI, Motorola R.C. Maher ECEN4002/5002 DSP Laboratory Spring 2003.
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
Intel’s MMX Dr. Richard Enbody CSE 820. Michigan State University Computer Science and Engineering Why MMX? Make the Common Case Fast Multimedia and Communication.
INTRODUCTION TO DIGITAL SIGNAL PROCESSORS
Computer Organization and Architecture
Computer Organization and Architecture
INTRODUCTION TO DIGITAL SIGNAL PROCESSORS (DSPs) Prof. Brian L. Evans Contributions by Niranjan Damera-Venkata and Magesh Valliappan Embedded Signal Processing.
COMP3221: Microprocessors and Embedded Systems Lecture 2: Instruction Set Architecture (ISA) Lecturer: Hui Wu Session.
1 Architectural Analysis of a DSP Device, the Instruction Set and the Addressing Modes SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software.
1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.
Alyssa Concha Microprocessors Final Project ADSP – SHARC Digital Signal Processor.
DSP Chip Architecture Team Members: Steve McDermott Ken Whelan Kyle Welch.
Chapter 12 CPU Structure and Function. Example Register Organizations.
EE 345S Real-Time Digital Signal Processing Lab Fall 2008
Microcontroller based system design
Computer Organization and Assembly language
Ehsan Shams Saeed Sharifi Tehrani. What is DSP ? Digital Signal Processing (DSP) is used in a wide variety of applications, and it is hard to find a good.
1/1/ / faculty of Electrical Engineering eindhoven university of technology Input/Output devices Part 3: Programmable I/O and DSP's dr.ir. A.C. Verschueren.
Computer Organization & Assembly Language
INTRODUCTION TO DIGITAL SIGNAL PROCESSORS (DSPs)
Real time DSP Professors: Eng. Julian Bruno Eng. Mariano Llamedo Soria.
Motivation Mobile embedded systems are present in: –Cell phones –PDA’s –MP3 players –GPS units.
INTRODUCTION TO DIGITAL SIGNAL PROCESSORS (DSPs) Prof. Brian L. Evans Contributions by Dr. Niranjan Damera-Venkata and Mr. Magesh Valliappan Embedded Signal.
Internal hardware and external components of a computer Three-box Model  Processor The brain of the system Executes programs A big finite state machine.
RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696
A Simple Computer consists of a Processor (CPU-Central Processing Unit), Memory, and I/O Memory Input Output Arithmetic Logic Unit Control Unit I/O Processor.
Classifying GPR Machines TypeNumber of Operands Memory Operands Examples Register- Register 30 SPARC, MIPS, etc. Register- Memory 21 Intel 80x86, Motorola.
IMAGE PROCESSING ON THE TMS320C6X VLIW DSP
Overview of Super-Harvard Architecture (SHARC) Daniel GlickDaniel Glick – May 15, 2002 for V (Dewar)
INTRODUCTION TO DIGITAL SIGNAL PROCESSORS
INTRODUCTION TO THE TMS320C6x VLIW DSP Prof. Brian L. Evans in collaboration with Niranjan Damera-Venkata and Magesh Valliappan Embedded Signal Processing.
DIGITAL SIGNAL PROCESSORS. Von Neumann Architecture Computers to be programmed by codes residing in memory. Single Memory to store data and program.
DSP Architectures Additional Slides Professor S. Srinivasan Electrical Engineering Department I.I.T.-Madras, Chennai –
Academic PowerPoint Computer System – Architecture.
Computer and Information Sciences College / Computer Science Department CS 206 D Computer Organization and Assembly Language.
The Evolution of TMS, Family of DSP’s
Overview von Neumann Architecture Computer component Computer function
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
1 x86 Programming Model Microprocessor Computer Architectures Lab Components of any Computer System Control – logic that controls fetching/execution of.
x86 Processor Architecture
TMS320C6713 Assembly Language
Embedded Systems Design
INTRODUCTION TO MICROPROCESSORS
Digital Signal Processors
Introduction to Digital Signal Processors (DSPs)
Morgan Kaufmann Publishers Computer Organization and Assembly Language
INTRODUCTION TO DIGITAL SIGNAL PROCESSORS (DSPs)
Computer Architecture
Introduction to Microprocessor Programming
Digital Signal Processors-1
INTRODUCTION TO THE TMS320C6000 VLIW DSP
Chapter 11 Processor Structure and function
ADSP 21065L.
Presentation transcript:

DIGITAL SIGNAL PROCESSING Dr. Hugh Blanton ENTC 4337/ENTC 5337

Dr. Blanton - ENTC General Architecture 2 Outline Signal processing applications Conventional DSP architecture TI TMS320C6000 DSP architecture introduction Signal processing on general-purpose processors Conclusion

Dr. Blanton - ENTC General Architecture 3 Signal Processing Applications Embedded system demand: product volume matters 400 Million units/year: automobiles, PCs, and cell phones 30 Million units/year: ADSL modems and printers Embedded system cost and input/output rates Low-cost, medium-throughput: low-end printers, wireless handsets, sound cards, car audio, disk drives High-cost, high-throughput: high-end printers, wireless basestations, 3-D sonar, 3-D images from 2-D X-rays (tomographic reconstruction ) Embedded processor requirements Inexpensive with small area and volume Predictable input/output (I/O) rates to/from processor Power constraints (severe for handheld devices) Single DSP Multiple DSPs

Dr. Blanton - ENTC General Architecture 4 Conventional DSP Processors Low cost: $3/processor in volume Deterministic interrupt service routine latency guarantees predictable input/output rates On-chip direct memory access (DMA) controllers Processes streaming input/output separately from CPU Sends interrupt to CPU when block has been read/written Ping-pong buffering CPU reads/writes buffer 1 while DMA reads/writes buffer 2 When DMA finishes with buffer 2, roles of buffer 1 & 2 switch

Dr. Blanton - ENTC General Architecture 5 Conventional DSP Processors Low power consumption: mW TI TMS320C mA/MIP  76.8 mW at 1.5 V, 160 MHz TI TMS320C mA/MIP  22.5 mW at 1.5 V, 300 MHz

Dr. Blanton - ENTC General Architecture 6 Conventional DSP Architecture Multiply-accumulate (MAC) in one instruction cycle Harvard architecture for fast on-chip input/output Data memory/bus(es) separate from program memory/bus One read from program memory per instruction cycle Two reads/writes from/to data memory per instruction cycle Instructions to keep pipeline (3-6 stages) full Zero-overhead looping (one pipeline flush to set up) Delayed branches Special addressing modes supported in hardware Bit-reversed addressing (e.g. fast Fourier transforms) Modulo addressing for circular buffers (e.g. FIR filters)

Dr. Blanton - ENTC General Architecture 7 Conventional DSP Architecture (con’t) x N-K+1 x N-K+2 x N-1 xNxN Data Shifting TimeBuffer contentsNext sample x N+1 x N+3 x N+2 n=N n=N+1 n=N+2 x N-K+3 x N-K+4 x N+1 x N+2 x N-K+2 x N-K+3 xNxN x N+1 Buffer of length K Used in finite and infinite impulse response filters Linear buffer Order by time index Data shifting update: discard oldest data, copy old data left, insert new data

Dr. Blanton - ENTC General Architecture 8 Conventional DSP Architecture (con’t) Modulo Addressing Time Buffer contents Next sample n=N n=N+1 n=N+2 x N-2 x N-1 x N-K+1 x N-K+2 x N-K+4 x N+1 x N+2 x N+3 x N-2 x N-1 x N+1 x N-K+2 xNxN x N-2 x N-1 x N+1 x N+2 xNxN xNxN xNxN xNxN x N-K+3 x N-K+4 Circular buffer Index oldest sample Modulo addressing update: insert new data at oldest index, update oldest index

Dr. Blanton - ENTC General Architecture 9 Conventional DSP Processors Summary

Dr. Blanton - ENTC General Architecture 10 Conventional DSP Processor Families Floating-point DSPs Used in first pass prototyping of algorithms Resurgence due to professional and car audio Different on-chip configurations in each family Size and map of data and program memory A/D, input/output buffers, interfaces, timers, and D/A Drawbacks to conventional DSP processors No byte addressing (needed for images and video) Limited on-chip memory Limited addressable memory on fixed-point DSPs (exceptions include Motorola and TI C5409) Non-standard C extensions for fixed-point data type DSP Market Fixed-point 95% Floating-point 5%

Dr. Blanton - ENTC General Architecture 11 Program RAM Data RAM or Cache Internal Buses Control Regs Regs (B0-B15)Regs (A0-A15).D1.M1.L1.S1.D2.M2.L2.S2 CPU Addr Data External Memory -Sync -Async DMA Serial Port Host Port Boot Load Timers Pwr Down TI TMS320C6000 DSP Architecture Simplified Architecture

Dr. Blanton - ENTC General Architecture 12 TI TMS320C6000 DSP Architecture Families: All support same C6000 instruction set C6200 fixed-point MHz ADSL, printers C6400 fixed-point 300-1,000 MHz video/comm. apps. C6700 floating-point MHz medical imaging, pro-audio TMS320C6211: 150 MHz, $21 in volume 300 million multiply-accumulates/s, 1200 RISC MIPS On-chip memory: 16 kwords program, 16 kwords data TMS320C6701 Evaluation Module Board 167 MHz 334 million multiply-accumulates/s, 1336 RISC MIPS On-chip memory: 16 kwords program, 16 kwords data External: one 133-MHz 64-kword, two 100-MHz 1-Mword

Dr. Blanton - ENTC General Architecture 13 TI TMS320C6000 DSP Architecture Very long instruction word (VLIW) size of 256 bits Eight 32-bit functional units with single cycle throughput One instruction cycle per clock cycle Data word size is 32 bits 16 (32 on C64) 32-bit registers in each of two data paths 40 bits can be stored in adjacent even/odd registers Two parallel data paths Data unit - 32-bit address calculations (modulo, linear) Multiplier unit - 16 bit  16 bit with 32-bit result Logical unit - 40-bit (saturation) arithmetic & compares Shifter unit - 32-bit integer ALU and 40-bit shifter

Dr. Blanton - ENTC General Architecture 14 TI TMS320C6000 Instruction Set.S Unit ADDNEG ADDKNOT ADD2OR ANDSET BSHL CLRSHR EXTSSHL MVSUB MVCSUB2 MVKXOR MVKHZERO.L Unit ABS NOT ADD OR AND SADD CMPEQ SAT CMPGT SSUB CMPLT SUB LMBD SUBC MV XOR NEG ZERO NORM.M Unit MPYSMPY MPYHSMPYH.D Unit ADD ST ADDA SUB LD SUBA MV ZERO NEG Other NOPIDLE C6000 Instruction Set by Functional Unit Six of the eight functional units can perform integer add, subtract, and move operations

Dr. Blanton - ENTC General Architecture 15 TI TMS320C6000 Instruction Set Arithmetic ABS ADD ADDA ADDK ADD2 MPY MPYH NEG SMPY SMPYH SADD SAT SSUB SUB SUBA SUBC SUB2 ZERO Logical AND CMPEQ CMPGT CMPLT NOT OR SHL SHR SSHL XOR Bit Management CLR EXT LMBD NORM SET Data Management LD MV MVC MVK MVKH ST Program Control B IDLE NOP C6000 Instruction Set by Category (un)signed multiplication saturation/packed arithmetic

Dr. Blanton - ENTC General Architecture 16 C6000 vs. C5000 Addressing Modes ADD #0FFh add.L1 -13,A1,A6 TI C5000TI C6000 (implied) add.L1 A7,A6,A7 ADD 010h not supported ADD * ldw.L1 *A5++[8],A1 Immediate The operand is part of the instruction Register Operand is specified in a register Direct Address of operand is part of the instruction (added to imply memory page) Indirect Address of operand is stored in a register

Dr. Blanton - ENTC General Architecture 17 TI TMS320C6000 DSP Architecture Deep pipeline 7-11 stages in C6200: fetch 4, decode 2, execute stages in C6700: fetch 4, decode 2, execute 1-10 Pentium IV has an estimated 20 pipeline stages Avoid using branch instructions in code Branch instruction in pipeline disables interrupts: latency of a branch is 5 cycles Avoid branches by using conditional execution: every instruction can be conditionally executed No hardware protection against pipeline hazards Compiler and assembler must prevent pipeline hazards

Dr. Blanton - ENTC General Architecture 18 TI TMS320C6700 Extensions.S Unit ABSDP CMPLTSP ABSSP RCPDP CMPEQDP RCPSP CMPEQSP RSARDP CMPGTDP RSQRSP CMPGTSP SPDP CMPLTDP.L Unit ADDDP INTSP ADDSP SPINT DPINT SPTRUNC DPSP SUBDP DPTRUNC SUBSP INTDP.M Unit MPYDP MPYID MPYI MPYSP.D Unit ADDAD LDDW C6700 Floating Point Extensions by Unit Four functional units can perform IEEE single-precision (SP) and double-precision (DP) floating-point add, subtract, move. Operations beginning with R are reciprocal calculations.

Dr. Blanton - ENTC General Architecture 19 Digital Signal Processor Cores ASIC with Programmable digital signal processor core RAM ROM Standard cells Codec Peripherals Gate array Microcontroller core Application Specific Integrated Circuit

Dr. Blanton - ENTC General Architecture 20 General Purpose Processors Multimedia applications on PCs Video, audio, graphics and animation Repetitive parallel sequences of instructions Single Instruction Multiple Data (SIMD) One instruction acts on multiple data in parallel Well-suited for graphics Native signal processing extensions use SIMD Sun Visual Instruction Set [1995] (UltraSPARC 1/2) Intel MMX [1996] (Pentium I/II/III/IV) Intel Streaming SIMD Extensions (Pentium III)

Dr. Blanton - ENTC General Architecture 21 DSP on General Purpose Processors (con’t) Programming is considerably tougher Ability of compilers to generate code for instruction set extensions may lag (e.g. four years for Pentium MMX) Libraries of routines using native signal processing Hand code in assembly for best performance Single-instruction multiple-data (SIMD) approach Pack/unpack data not aligned on SIMD word boundaries Saturation arithmetic in MMX; not supported in VIS Extended-precision accumulation in MMX; none in VIS Application speedup for Intel MMX and Sun VIS Signal and image processing: 1.5:1 to 2:1 Graphics: 4:1 to 6:1 (no packing/unpacking)

Dr. Blanton - ENTC General Architecture 22 Concluding Remarks Conventional digital signal processors High performance vs. power consumption/cost/volume Excellent at one-dimensional processing Per cycle: 1 16  16 MAC & 4 16-bit RISC instructions TMS320C6000 VLIW DSP family High performance vs. cost/volume Excellent at multidimensional signal processing Per cycle: 2 16  16 MACs & 4 32-bit RISC instructions Native signal processing Available on desktop computers Excels at graphics Per cycle: 2 8  16 MACs OR 8 8-bit RISC instructions Use assembly for computational kernels and C for main program (control code, interrupt def.)

Dr. Blanton - ENTC General Architecture 23 Concluding Remarks Digital signal processor market 40% annual growth rate fastest in semiconductor market Revenue: $3.5B ‘98, $4.4B ‘99, $6.1B ‘00, $4.5B ‘01, $4.9B ‘ : 44% TI, 23% Agere, 13% Motorola, 10% Analog Devices 2001: 40% TI, 16% Agere, 12% Motorola, 8% Analog Devices 2002: 43% TI, 14% Motorola, 14% Agere, 9% Analog Devices Independent processor benchmarking by industry Berkeley Design Technology Inc. EDN Embedded Microprocessor Benchmark Consortium Web resources Newsgroup comp.dsp: FAQ Embedded processors and systems: On-line courses:

Dr. Blanton - ENTC General Architecture 24 Concluding Remarks Web resources Newsgroup comp.dsp: FAQ Embedded processors and systems: On-line courses: