Fast VLSI Implementation of Sorting Algorithm for Standard Median Filters Hyeong-Seok Yu SungKyunKwan Univ. Dept. of ECE, Vada Lab.

Slides:

Advertisements

Similar presentations

1 ECE734 VLSI Arrays for Digital Signal Processing Chapter 3 Parallel and Pipelined Processing.

Advertisements

MP3 Optimization Exploiting Processor Architecture and Using Better Algorithms Mancia Anguita Universidad de Granada J. Manuel Martinez – Lechado Vitelcom.

Spread Spectrum Chapter 7. Spread Spectrum Input is fed into a channel encoder Produces analog signal with narrow bandwidth Signal is further modulated.

EECC756 - Shaaban #1 lec # 1 Spring Systolic Architectures Replace single processor with an array of regular processing elements Orchestrate.

Discussion #33 Adjacency Matrices. Topics Adjacency matrix for a directed graph Reachability Algorithmic Complexity and Correctness –Big Oh –Proofs of.

Houshmand Shirani-mehr 1,2, Tinoosh Mohsenin 3, Bevan Baas 1 1 VCL Computation Lab, ECE Department, UC Davis 2 Intel Corporation, Folsom, CA 3 University.

Spartan II Features  Plentiful logic and memory resources –15K to 200K system gates (up to 5,292 logic cells) –Up to 57 Kb block RAM storage  Flexible.

1 Outline  Introduction to JEPG2000  Why another image compression technique  Features  Discrete Wavelet Transform  Wavelet transform  Wavelet implementation.

Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.

Frame-Level Pipelined Motion Estimation Array Processor Surin Kittitornkun and Yu Hen Hu IEEE Trans. on, for Video Tech., Vol. 11, NO.2 FEB, 2001.

ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Why Systolic Architecture ? VLSI Signal Processing 台灣大學電機系吳安宇.

Instruction Level Parallelism (ILP) Colin Stevens.

Vishwani D. Agrawal James J. Danaher Professor

The Design of Improved Dynamic AES and Hardware Implementation Using FPGA 游精允.

Design Technology Center National Tsing Hua University A New Paradigm for Scan Chain Diagnosis Using Signal Processing Techniques Shi-Yu Huang ( 黃錫瑜 )

Analysis and Avoidance of Cross-talk in on-chip buses Chunjie Duan Ericsson Wireless Communications Anup Tirumala Jasmine Networks Sunil P Khatri University.

04/26/2006VLSI Design & Test Seminar Series 1 Phase Delay in MAC-based Analog Functional Testing in Mixed-Signal Systems Jie Qin, Charles Stroud, and Foster.

Real Time Image Feature Vector Generator Employing Functional Cache Memory for Edge Takuki Nakagawa, Department of Electronic Engineering The University.

Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.

Real time DSP Professors: Eng. Julian Bruno Eng. Mariano Llamedo Soria.

1 Miodrag Bolic ARCHITECTURES FOR EFFICIENT IMPLEMENTATION OF PARTICLE FILTERS Department of Electrical and Computer Engineering Stony Brook University.

DRRA Dynamically Reconfigurable Resource Array

1 Electronics Lab, Physics Dept., Aristotle Univ. of Thessaloniki, Greece 2 Micro2Gen Ltd., NCSR Demokritos, Greece 17th IEEE International Conference.

Efficient FPGA Implementation of QR

CALTECH CS137 Winter DeHon 1 CS137: Electronic Design Automation Day 12: February 6, 2006 Sorting.

A bit-streaming, pipelined multiuser detector for wireless communications Sridhar Rajagopal and Joseph R. Cavallaro Rice University

Efficient Mapping onto Coarse-Grained Reconfigurable Architectures using Graph Drawing based Algorithm Jonghee Yoon, Aviral Shrivastava *, Minwook Ahn,

Tinoosh Mohsenin and Bevan M. Baas VLSI Computation Lab, ECE Department University of California, Davis Split-Row: A Reduced Complexity, High Throughput.

FPGA Implementations for Volterra DFEs

Low Power – High Speed MCML Circuits (II)

Low-Power and Area-Efficient Carry Select Adder on Reconfigurable Hardware Presented by V.Santhosh kumar, B.Tech,ECE,4 th Year, GITAM University Under.

J. Greg Nash ICNC 2014 High-Throughput Programmable Systolic Array FFT Architecture and FPGA Implementations J. Greg.

Radix-2 2 Based Low Power Reconfigurable FFT Processor Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Gin-Der Wu and Yi-Ming Liu Department.

VLSI Algorithmic Design Automation Lab. 1 Integration of High-Performance ASICs into Reconfigurable Systems Providing Additional Multimedia Functionality.

RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia.

ISSS 2001, Montréal1 ISSS’01 S.Derrien, S.Rajopadhye, S.Sur-Kolay* IRISA France *ISI calcutta Combined Instruction and Loop Level Parallelism for Regular.

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Latches & Flip-Flops.

A High-Speed Hardware Implementation of the LILI-II Keystream Generator Paris Kitsos...in cooperation with Nicolas Sklavos and Odysseas Koufopavlou Digital.

Radix Sort and Hash-Join for Vector Computers Ripal Nathuji 6.893: Advanced VLSI Computer Architecture 10/12/00.

Priority encoder. Overview Priority encoder- theoretic view Other implementations The chosen implementation- simulations Calculations and comparisons.

Registers; State Machines Analysis Section 7-1 Section 5-4.

Cameron Rowe.  Introduction  Purpose  Implementation  Simple Example Problem  Extended Kalman Filters  Conclusion  Real World Examples.

A New Class of High Performance FFTs Dr. J. Greg Nash Centar ( High Performance Embedded Computing (HPEC) Workshop.

Recursive Architectures for 2DLNS Multiplication RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR 11 Recursive Architectures for 2DLNS.

Digital Logic Design Basics Combinational Circuits Sequential Circuits Pu-Jen Cheng Adapted from the slides prepared by S. Dandamudi for the book, Fundamentals.

A 1.2V 26mW Configurable Multiuser Mobile MIMO-OFDM/-OFDMA Baseband Processor Motivations –Most are single user, SISO, downlink OFDM solutions –Training.

VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 성균관대학교 조 준 동 교수

EE3A1 Computer Hardware and Digital Design Lecture 9 Pipelining.

SIMD Implementation of Discrete Wavelet Transform Jake Adriaens Diana Palsetia.

Hierarchical Systolic Array Design for Full-Search Block Matching Motion Estimation Noam Gur Arie,August 2005.

An FFT for Wireless Protocols Dr. J. Greg Nash Centar ( HAWAI'I INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES Mobile.

VLSI SP Course 2001 台大電機吳安宇 1 Why Systolic Architecture ? H. T. Kung Carnegie-Mellon University.

Waseda University Low-Density Parity-Check Code: is an error correcting code which achieves information rates very close to the Shanon limit. Message-Passing.

Low Power, High-Throughput AD Converters

A Concurrent Matrix Transpose Algorithm Pourya Jafari.

Optimizing Interconnection Complexity for Realizing Fixed Permutation in Data and Signal Processing Algorithms Ren Chen, Viktor K. Prasanna Ming Hsieh.

Presenter: Darshika G. Perera Assistant Professor

Hiba Tariq School of Engineering

CORDIC (Coordinate rotation digital computer)

Lattice Struture.

Mohammad Gh. Alfailakawi, Imtiaz Ahmad, Suha Hamdan

Algorithms + Data Structures = Programs -Niklaus Wirth

Centar ( Global Signal Processing Expo

Sridhar Rajagopal and Joseph R. Cavallaro Rice University

Sridhar Rajagopal and Joseph R. Cavallaro Rice University

High Throughput LDPC Decoders Using a Multiple Split-Row Method

Sequence Alignment with Traceback on Reconfigurable Hardware

Algorithms + Data Structures = Programs -Niklaus Wirth

Chapter 6 Discrete-Time System

Presentation transcript:

Fast VLSI Implementation of Sorting Algorithm for Standard Median Filters Hyeong-Seok Yu SungKyunKwan Univ. Dept. of ECE, Vada Lab. Paper : FP 1.3

VLSI Algorithmic Design Automation Lab ASIC Conference 2  Introduction Signal and image processing applications Smooth the Noisy signals & Preserving the edge information Smoothing Techniques Linear Filtering Median Filtering

VLSI Algorithmic Design Automation Lab ASIC Conference 3  Introduction (cont') Design Method ( H/W vs. F/W & S/W )

VLSI Algorithmic Design Automation Lab ASIC Conference 4  Median Filtering 1-D Median Input signals : x 1, x 2, x 3,..., x (s-1), x (s) Window sizes : w = 2k + 1 Median Output yi = Median [x (i-k),..., x (i-1), x i, x (i+1),..., x (i+k) ]

VLSI Algorithmic Design Automation Lab ASIC Conference 5  The Algorithm Median Filter for General-purpose signal or Image processor different word lengths extensible window sizes real time rate Target algorithm for VLSI implementation Only a few simple units A simple and regular communication & control scheme simple window size extension & no time delay increase no feedback loop for the availability of pipelining

VLSI Algorithmic Design Automation Lab ASIC Conference 6  The Algorithm (cont’) The conventional single array structures Reference [5], [7], [8], [10], [11], [12], [13] Disadvantages A feedback loop - pipelining, delay, complexity non-linear delay-increase for extracting the rank order Excessive memory elements

VLSI Algorithmic Design Automation Lab ASIC Conference 7  The Algorithm (cont') The single array insertion sorting of the ordered list using History Matrix Reducing area Less switching activity Higher throughput Efficient VLSI implementation

VLSI Algorithmic Design Automation Lab ASIC Conference 8  The Algorithm (cont') A filter array for proposed algorithm Cascade of cells Simple inter-communication with adjacent cells

VLSI Algorithmic Design Automation Lab ASIC Conference 9  The Algorithm (cont') Definition of symbols z : i-th ordered cell values h : i-th row elements d : a lower cell u : an upper cell p : a present cell h(i, j) : elements from i-th to j-th column History Matrix The set of the elements with 0 or 1 The w  w number of 1 bit flip-flop

VLSI Algorithmic Design Automation Lab ASIC Conference 10  The Algorithm (cont') The operation of History Matrix i-th column : rank ( # of ‘1’ ) of i-th input data reversely The elements of row of first column  New input data  1 predictive < New input data  0 The elements of other rows up transition of unit value : replace with lower rows down transition of unit value : replace with upper rows no transition of unit value : replace or shift It contains index values and rank values

VLSI Algorithmic Design Automation Lab ASIC Conference 11  The Algorithm (cont') Processing sequence of unit values and matrix Initially, all the elements of the matrix = 1 and filled by 0 Sorted in decreasing order x | y : the unit value index | the unit value

VLSI Algorithmic Design Automation Lab ASIC Conference 12  The Algorithm (cont') The extracting operation of rank of each cell example of window size 5 x 5

VLSI Algorithmic Design Automation Lab ASIC Conference 13  The Algorithm (cont') Proposed filter cell architecture simple control : a few gates, mux, shift reg. memory unit : unit value and a single row of History Matrix

VLSI Algorithmic Design Automation Lab ASIC Conference 14  Conclusions Experimental Results 25 ns delay reported by SYNOPSYS TM 40 MHz clock frequency for 8 bit word-length bps operation always same delay for different window size 1185 gates for window size= 5 Summarized comparison results