Video on DSP and FPGA John Johansson April 12, 2004.

Slides:



Advertisements
Similar presentations
High Speed Data Acquisition Architectures. Some Basic Architectures Non-Buffered (streaming) FIFO Buffered Multiplexed RAM Ping Pong Multiplexed RAM Dual.
Advertisements

DSPs Vs General Purpose Microprocessors
Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
System on a Chip (SoC) An Overview David Cheung Christopher Shannon.
Boot Issues Processor comparison TigerSHARC multi-processor system Blackfin single-core.
Digital Signal Processing and Field Programmable Gate Arrays By: Peter Holko.
TigerSHARC and Blackfin Different Applications. Introduction Quick overview of TigerSHARC Quick overview of Blackfin low power processor Case Study: Blackfin.
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
Evaluation of Data-Parallel Splitting Approaches for H.264 Decoding
Introduction to Reconfigurable Computing CS61c sp06 Lecture (5/5/06) Hayden So.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
IO Controller Module Arbitrates IO from the CCP Physically separable from CCP –Can be used as independent data logger or used in future projects. Implemented.
Hardware accelerator for PPC microprocessor Final presentation By: Instructor: Kopitman Reem Fiksman Evgeny Stolberg Dmitri.
Associative Cache Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word (or sub-address in line) Tag.
Chapter 15 Digital Signal Processing
Configurable System-on-Chip: Xilinx EDK
Software / Hardware Co-Design of a JPEG Encoder Team Members: Joe Salemi Brandon Sterner.
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications From J. Fowers, G. Brown, P. Cooke, and G. Stitt, University.
CPU Chips The logical pinout of a generic CPU. The arrows indicate input signals and output signals. The short diagonal lines indicate that multiple pins.
Introduction to FPGA and DSPs Joe College, Chris Doyle, Ann Marie Rynning.
Using Programmable Logic to Accelerate DSP Functions 1 Using Programmable Logic to Accelerate DSP Functions “An Overview“ Greg Goslin Digital Signal Processing.
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
GPGPU platforms GP - General Purpose computation using GPU
9/20/6Lecture 3 - Instruction Set - Al1 Interfacing Devices to the
FPGA Based Fuzzy Logic Controller for Semi- Active Suspensions Aws Abu-Khudhair.
Klas Skogmar, Lund Institute of Technology Real-time Video Effects Using Programmable Graphics Cards Master of Science Thesis Klas Skogmar
® ChipScope ILA TM Xilinx and Agilent Technologies.
Xilinx at Work in Hot New Technologies ® Spartan-II 64- and 32-bit PCI Solutions Below ASSP Prices January
JPEG C OMPRESSION A LGORITHM I N CUDA Group Members: Pranit Patel Manisha Tatikonda Jeff Wong Jarek Marczewski Date: April 14, 2009.
Programmable Solutions in Video Capture/Editing. Overview  Xilinx - Industry Leader in FPGAs/CPLDs High-density, high-speed, programmable, low cost logic.
CDA 3101 Fall 2013 Introduction to Computer Organization I/O Devices and Buses 15 November 2013.
A New Reference Design Development Environment for JPEG 2000 Applications Bill Finch CAST, Inc. Warren Miller AVNET Design Services DesignCon 2003 January.
Presenter: Hong-Wei Zhuang On-Chip SOC Test Platform Design Based on IEEE 1500 Standard Very Large Scale Integration (VLSI) Systems, IEEE Transactions.
Matt Waldersen T.J. Strzelecki Rick Schuman Krishna Jharjaria.
Real-Time HD Harmonic Inc. Real Time, Single Chip High Definition Video Encoder! December 22, 2004.
10/10/20151 DIF – Digital Imaging Fast Ali Nuhi and Everett Salley EEL4924 Senior Design Date: 02 March 2011.
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
® SPARTAN Series High Volume System Solution. ® Spartan/XL Estimated design size (system gates) 30K 5K180K XC4000XL/A XC4000XV Virtex S05/XL.
Reminder Lab 0 Xilinx ISE tutorial Research Send me an if interested Looking for those interested in RC with skills in compilers/languages/synthesis,
PROCStar III Performance Charactarization Instructor : Ina Rivkin Performed by: Idan Steinberg Evgeni Riaboy Semestrial Project Winter 2010.
Lev Weisfeiler Aware, Inc. Streaming and JPEG2000 Compression for use in Uncompressed Image Archives.
Radix-2 2 Based Low Power Reconfigurable FFT Processor Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Gin-Der Wu and Yi-Ming Liu Department.
Design of a Novel Bridge to Interface High Speed Image Sensors In Embedded Systems Tareq Hasan Khan ID: ECE, U of S Term Project (EE 800)
Owner: VBHUSales Training 03/15/2013 Cypress Confidential IDT 72T36135M vs. Cypress CYF072x Video Buffering Applications High density FIFOs with unmatched.
Lecture 13: Logic Emulation October 25, 2004 ECE 697F Reconfigurable Computing Lecture 13 Logic Emulation.
9/20/6Lecture 12 - Interfacing Devices1 Interfacing Devices to the
The World Leader in High Performance Signal Processing Solutions Multi-core programming frameworks for embedded systems Kaushal Sanghai and Rick Gentile.
بسم الله الرحمن الرحيم MEMORY AND I/O.
Implementing JPEG Encoder for FPGA ECE 734 PROJECT Deepak Agarwal.
System on a Chip (SoC) An Overview David Cheung Christopher Shannon.
Digital Signal Processor HANYANG UNIVERSITY 학기 Digital Signal Processor 조 성 호 교수님 담당조교 : 임대현
Types of Micro-operation  Transfer data between registers  Transfer data from register to external  Transfer data from external to register  Perform.
Lecture 2. A Computer System for Labs
DIRECT MEMORY ACCESS and Computer Buses
Embedded Systems Design
FPGAs in AWS and First Use Cases, Kees Vissers
Introduction.
Cache memory Direct Cache Memory Associate Cache Memory
LOW POWER DIGITAL VIDEO COMPRESSION HARDWARE DESIGN
Lecture 41: Introduction to Reconfigurable Computing
Chapter 1 Introduction.
A Digital Signal Prophecy The past, present and future of programmable DSP and the effects on high performance applications Continuing technology enhancements.
VLIW DSP vs. SuperScalar Implementation of a Baseline H.263 Encoder
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
Computer Evolution and Performance
Problem and Motivation
Objectives Describe how common characteristics of CPUs affect their performance: clock speed, cache size, number of cores Explain the purpose and give.
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
Memory Considerations
Presentation transcript:

Video on DSP and FPGA John Johansson April 12, 2004

Agenda ► Overview of video processing ► A typical video encoder and the DCT ► Requirements of DCT ► Comparison of DSP and FPGA chips ► Analysis and conclusions ► Questions

Overview of Video Processing Video processing generally involves ► Compression / Decompression ► Special Effects ► TV Broadcasting ► Focus on Compression

Video Encoding Typical Video Encoder ► Focus on DCT algorithm

The Discrete Cosine Transformation ► DCT is a spatial transform, like the FFT ► Rearranges data into a more compressible format ► Typically done on 64 (8x8) pixels at a time ► Big nasty equation … ► … But no sharp teeth (optimizes extremely well)

Requirements for DCT Basic Idea ► Read in data (64 values, 8-24 bits signed / unsigned) ► Do transformation ► Write out data ► Profit !!! ► Easy, right ??

Requirements for DCT Memory Limitations ► Load an entire frame? ► One frame can vary from 50K to 50 MB in size when uncompressed ► External memory is much slower, more plentiful ► Do the DCT in chunks (8x8 block)

Requirements for DCT Degree of Parallelism ► DCT can be done serially, or broken up and done in parallel ► Parallelism depends largely on available memory ► Price / Performance tradeoffs

The Challengers Xilinx Spartan-3 FPGA ► 50K – 5M gates ► 326 MHz ► 100 KB – 2.3 MB internal memory ► dedicated multipliers ► Oodles of I/O pins (up to 784) Look at XC3S1000 ► 1M gates, 560 KB memory, 24 multipliers, 376 I/O pins

The Challengers ADSP-BF5xx Blackfin Processor ► 200 – 750 MHz ► Single or dual core ► DMA memory controller ► 52 KB – 326 KB internal memory ► Other processor goodies Look at ADSP-BF533 ► 500 MHz, single core, 148 KB memory

Performance How do we correctly benchmark an algorithm between two completely different processors? ► I don’t really know ► Look at some rough performance indicators and try and draw a conclusion

Performance FPGA ► Varies from 1-25 cycle(s) / pixel for DCT ► Reading and writing of data takes additional time ► Clock speed limited by degree of parallelism DSP ► Roughly 5 cycles / pixel for DCT ► DMA controller allows parallel reading and writing with some setup overhead

(Ideal) Performance Spartan-3 ► 64 read + 64 compute + 64 write = 196 cycles / block ► 326 MHz = 1.66 Mblocks / second Blackfin ► 319 compute + 10 DMA transfer = 329 cycles / block ► 500 MHz = 1.52 Mblocks / second

Advantages FPGA ► Potential for very high parallelism ► Existing video designs available for purchase ► Good middleman functionality DSP ► Higher potential clock speed ► Much more flexible design ► DMA memory controller

Disadvantages FPGA ► Low flexibility ► Hard to optimize ► Limited logic blocks DSP ► Difficult to achieve full utilization ► Higher power consumption

Conclusions FPGA ► Best for well defined roles, like DCT ► Faster in situations where throughput matters ► Can be very expensive DSP ► Better off for more flexible roles, like full encoder ► Situations where large amounts of (additional) memory are needed

Questions?

References Xilinx Spartan III jsp?title=Spartan-3 Analog Devices Blackfin fin/index.html

References Other articles /xc_pdf/xc_videoapps44.pdf _dspvid43.htm ectronics.com/ednmag/article/CA336860?stt=000& pubdate=11%2F27%25