Real Time Image Feature Vector Generator Employing Functional Cache Memory for Edge Takuki Nakagawa, Department of Electronic Engineering The University.

Slides:



Advertisements
Similar presentations
Patient information extraction in digitized X-ray imagery Hsien-Huang P. Wu Department of Electrical Engineering, National Yunlin University of Science.
Advertisements

1 A HIGH THROUGHPUT PIPELINED ARCHITECTURE FOR H.264/AVC DEBLOCKING FILTER Kefalas Nikolaos, Theodoridis George VLSI Design Lab. Electrical & Computer.
Lecture 19: Parallel Algorithms
A Novel Approach of Assisting the Visually Impaired to Navigate Path and Avoiding Obstacle-Collisions.
CENTRAL PROCESSING UNIT
Chapter 8 Content-Based Image Retrieval. Query By Keyword: Some textual attributes (keywords) should be maintained for each image. The image can be indexed.
Embedded Software Optimization for MP3 Decoder Implemented on RISC Core Yingbiao Yao, Qingdong Yao, Peng Liu, Zhibin Xiao Zhejiang University Information.
COMP3221: Microprocessors and Embedded Systems Final Exam Lecturer: Hui Wu Session 1, 2005.
Computes the partial dot products for only the diagonal and upper triangle of the input matrix. The vector computed by this architecture is added to the.
Lecture 21: Parallel Algorithms
Page 1 CS Department Parallel Design of JPEG2000 Image Compression Xiuzhen Huang CS Department UC Santa Barbara April 30th, 2003.
An Analog Wavelet Transform CMOS Imager Chip
1 Parallel Algorithms III Topics: graph and sort algorithms.
Energy and Delay Improvement via Decimal Floating Point Hossam A.H.Fahmy, Electronics and Communications Department, CairoUniversity Egypt and.
A Novel 2D To 3D Image Technique Based On Object- Oriented Conversion.
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.
Introduction Computer Hardware Jess 2006 The CPU.
What’s on the Motherboard? The two main parts of the CPU are the control unit and the arithmetic logic unit. The control unit retrieves instructions from.
Sept EE24C Digital Electronics Project Design of a Digital Alarm Clock.
LOGO. Characteristics of Processors  Funtions  Is the central processing unit, performing all the processing, calculation and control systems.  The.
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
Human tracking and counting using the KINECT range sensor based on Adaboost and Kalman Filter ISVC 2013.
H.264 Deblocking Filter Irfan Ullah Department of Information and Communication Engineering Myongji university, Yongin, South Korea Copyright © solarlits.com.
Computer Processing of Data
Practical PC, 7th Edition Chapter 17: Looking Under the Hood
1 Electronics Lab, Physics Dept., Aristotle Univ. of Thessaloniki, Greece 2 Micro2Gen Ltd., NCSR Demokritos, Greece 17th IEEE International Conference.
1 Lecture 21: Core Design, Parallel Algorithms Today: ARM Cortex A-15, power, sort and matrix algorithms.
Computers Are Your Future Eleventh Edition Chapter 2: Inside the System Unit Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall1.
Real-Time HD Harmonic Inc. Real Time, Single Chip High Definition Video Encoder! December 22, 2004.
3. ISP Hardware Design & Verification
Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.
Presenter: Jyun-Yan Li Effective Software-Based Self-Test Strategies for On-Line Periodic Testing of Embedded Processors Antonis Paschalis Department of.
FPGA Based String Matching for Network Processing Applications Janardhan Singaraju, John A. Chandy Presented by: Justin Riseborough Albert Tirtariyadi.
MOTION ESTIMATION IMPLEMENTATION IN VERILOG
Parallel Algorithms Patrick Cozzi University of Pennsylvania CIS Spring 2012.
Radix-2 2 Based Low Power Reconfigurable FFT Processor Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Gin-Der Wu and Yi-Ming Liu Department.
An Efficient Implementation of Scalable Architecture for Discrete Wavelet Transform On FPGA Michael GUARISCO, Xun ZHANG, Hassan RABAH and Serge WEBER Nancy.
NA62 Trigger Algorithm Trigger and DAQ meeting, 8th September 2011 Cristiano Santoni Mauro Piccini (INFN – Sezione di Perugia) NA62 collaboration meeting,
Novel Hardware-software Architecture for Computation of DWT Using Recusive Merge Algorithm Piyush Jamkhandi, Amar Mukherjee, Kunal Mukherjee, Robert Franceschini.
Fast Census Transform-based Stereo Algorithm using SSE2
Face Image-Based Gender Recognition Using Complex-Valued Neural Network Instructor :Dr. Dong-Chul Kim Indrani Gorripati.
A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.
A Tutorial on using SIFT Presented by Jimmy Huff (Slightly modified by Josiah Yoder for Winter )
Edge Detection. 256x256 Byte image UART interface PC FPGA 1 Byte every a few hundred cycles of FPGA Sobel circuit Edge and direction.
® Virtex-E Extended Memory Technical Overview and Applications.
Central Processing Unit (CPU) MATTHEW BOWEN. Function  The function of the CPU is to execute all of the commands and calculations or “processes” that.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Wonjun Kim and Changick Kim, Member, IEEE
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
Machine Vision. Image Acquisition > Resolution Ability of a scanning system to distinguish between 2 closely separated points. > Contrast Ability to detect.
Preliminary Transformations Presented By: -Mona Saudagar Under Guidance of: - Prof. S. V. Jain Multi Oriented Text Recognition In Digital Images.
Fast VLSI Implementation of Sorting Algorithm for Standard Median Filters Hyeong-Seok Yu SungKyunKwan Univ. Dept. of ECE, Vada Lab.
C.E. Goutis V.I.Kelefouras University of Patras Department of Electrical and Computer Engineering VLSI lab Date: 20/11/2015 Compilers for Embedded Systems.
Microprocessor Design Process
NOISE MEASUREMENTS ON CLICPIX AND FUTURE DEVELOPMENTS Pierpaolo Valerio.
Buffering Techniques Greg Stitt ECE Department University of Florida.
1 “A picture speaks a thousand words.” Art By Ranjith & Waquas Islamiah Evening College.
Computer Hardware What is a CPU.
End OF Column Circuits – Design Review
Backprojection Project Update January 2002
Introduction Computer Hardware Jess 2006
Hiba Tariq School of Engineering
Image Convolution with CUDA
Subject Name: Digital Signal Processing Algorithms & Architecture
VECTOR MEDIAN VIDEO FILTER
Patrick Cozzi University of Pennsylvania CIS Spring 2011
Patrick Cozzi University of Pennsylvania CIS Spring 2011
Computer and Robot Vision I
Presentation transcript:

Real Time Image Feature Vector Generator Employing Functional Cache Memory for Edge Takuki Nakagawa, Department of Electronic Engineering The University of Tokyo, Japan and Tadashi Shibata, Department of Electrical Engineering and Information Systems The University of Tokyo, Japan ©2009 IEEE Vishesh Kalra EE Vishesh Kalra EE

I NTRODUCTION Image Processing requires three major stages: 1>Extracting features from input image. 2>Summarizing them as a Feature Vector. 3>Classification of Feature Vectors. The Concern of this paper is to carry out the Second stage.

Introduction contd. Edge Information extracted from an image plays a central role in image perception. In this paper, four directional edges are extracted from 64x64 local image (recognition window) using 5x5 filtering kernels of an input image of size 256x256. In order to scan the entire image, recognition window has to scan pixel to pixel from top to bottom by shifting itself.

Earlier Paper Edge flag detection from each pixel location is carried out at every clock cycle and edge flag bits are temporarily stored in an array of 64x64 shift registers to generate a histogram. When the recognition window moves, the edge data are shifted accordingly in the shift registers. As a result 64-element feature vector is generated in every 64 clock cycles. With this Architecture it has become possible to generate 1.5x10^6 feature vectors/sec at a frequency of 100 MHz.

Vector Generation Algorithm Edge filtering is carried out in 4 directions i.e. Horizontal, Vertical, +45 and -45 degrees and four edge maps are generated from 64x64 recognition window. Then a Feature vector is generated by dividing each edge map into 16 bins and number of edge flags in each bin are counted and 64 dimension feature vector is generated by concatenating 4 different histograms

System Architecture Function of this unit is to generate a 16 element Edge Histogram. The unit is composed of Functional Cache Memory for storing edge flag bits and Processing Element array for Edge Counting. The Functional Cache Memory includes four 64x65-SRAM banks and two Crossbar switches to be used for reordering of edge flag locations.

Functional Cache Memory Before scanning starts, we must store all the data of 64 columns of edge flag bits(256x64) in four SRAM banks. In each 64x65-SRAM bank, 64 columns are filled with edge bits and one column is left empty. Edge flags bits in every row are read out sequentially from the top row to bottom row and summed up to produce 16 element of histograms after 64 cycles.

Functional Cache Memory contd. When all 256 rows of data are read out, the recognition window must be shifted one pixel right. In generating a histogram from the vertical edge map, the basic operation is the summation of edge flag bits within vertical slots. However in case of +45 and -45 degrees edge maps, the addition of the diagonally adjacent edge flag bits is very complicated. To generate it, an arithmetic and shift algorithm has been developed.

Results The Chip was designed in a 0.18-um 5 metal CMOS technology and entire simulation was confirmed by Nano-Sim Simulation. The Architecture enables us to generate 3.9x10^7 feature vectors/second 100 MHz) which is 5x10^3 times faster than software processing using 2.16-GHz processor.

Comparison of Processing time for scanning 640x480 pixel image

Conclusion An image-feature-vector-generation VLSI has been developed aiming at building real-time recognition systems. By employing the functional cache memory architecture, seamless scanning of the recognition window over the entire image and generation of one feature vector/cycle have been accomplished. The system was designed for 256×256 size images, but it is easily extendible to larger size images by just increasing the number of SRAM banks in proportion to the height of the image. The chip was designed in a 0.18-μm 5-metal CMOS technology and the operation was confirmed by Nano sim simulation. If the chip is operated at 100 MHz, it is possible to scan a VGA-size image at a rate of 126 frames/sec, which is 20 times faster than the previous designs.