Parallel Computer Vision and Image Processing Algorithms

Parallel Computer Vision and Image Processing Algorithms
Ruoxi Zhang

Computer Vision Computer Vision (CV) is a rapidly growing field
intent on enabling computers to process, analyze, and understand the information of images to produce structured information or make decisions.

Applications navigation robotics object detection augmented reality
optical character recognition face recognition Computer vision and image processing algorithms are involved with a range of applications: Augmented reality(AR): is a live, direct or indirect view of a physical, real-world environment whose elements are augmented by computer generated sensory input Optical character recognition: computer SW that recognizes numbers and letters of the alphabet which are written on paper The development of computer vision is essential for the advancement of a multitude of areas including medical, entertainment and security. extraction of left ventricular contours from Left ventriculograms by means of a neural edge detector AR quickly scanning a real-world environment Do simultaneous localization and mapping at the level of objects using valid masks(kernels) Gaming Facial feature detection improve face recognition Created by facial landmark detector virtual makeover Tacking library, identify faces, need accuracy

For example, … These two are not the same!

Parallelism needs create requirements: more computational resources
nature of image—2D pixel array SIMD—divide into sub-images (this presentation based on this) MISD—pipelining powerful and efficient platform: the modern GPU GPGPUs (General Purpose GPUs) CUDA (Compute Unified Device Architecture) API (Application Programming Interface) computer vision algorithms require a large number of computations as well as an equally large number of memory values need of more computational resources is increased Reasonable time + acceptable accuracy, parallelize the algorithms Image made by pixels, each has 3 values, RGB Image can be represented as a 2D array Parallel image processing tasks can be divided into small steps, build a pipeline machine At the same time, Graphics Processing Units (GPUs) have proven to be a powerful and efficient computational platform. The modern GPU can natively perform thousands of parallel computations per clock cycle. Recent developments have allowed GPUs to be used for more than just graphics processing and rendering enables software developers to access the GPU's virtual instruction set through standard programming languages such as 'C'

Examples edge detection using Sobel Filter
object recognition using Histogram

Edge Detection: Sobel Filter
foundational stage in feature detection and extraction edge: big intensity differences Sobel Filter computes an approximation of the gradient of the image intensity function 3 x 3 kernel Edge detection aims to identify image locations in which image brightness(intensity) distinctly differs Done on a gray scale image It is a foundational stage in feature detection and feature extraction. Sobel Filter computes an approximation of the gradient of the image intensity function Coevolved with a kernel(typically 3 X 3) in horizontal and vertical direction For Gx, horizontal direction is derivative, and vertical is Gaussian smoothing to reduce noises Magnitude and orientation

CPU: GPU: On CPU On the GPU using CUDA programming Critical region

Kernels Kernels Image Zero padding at the boundary of image Pts and its 8 neighbors Convolve with kernel for x direction and kernel for y direction, get Gx and Gy at that pixel location

Outperform Execution time of Sobel implementation

Object Recognition: Histogram Construction
a way of representing a distribution—histogram two parallel tasks: histogram construction and histogram matching Histograms are used in various ﬁelds to quickly proﬁle the distribution of a large amount of data The practical use of histograms relates to object recognition: they allow an estimation of recognition without feature correspondence. (where they are) IMAGE There are two parallel tasks involved, each with distinct issues in their execution on the GPU: histogram construction and histogram matching (with candidate) Construct its histogram, find match (yellow duck rotate) template image input images

the sequential algorithm calculate normalized histogram

… … … … Image N x N Quantization N x N N by N image Select the top left corner as example Select 32 bin size Quantization 数字化 0-7 -> bin 0 8-15 -> bin 1 … > bin 31 32 bins total

Bin 0 N x N … … … … … … … Bin 1 N x N Image N x N Quantization N x N … 32 Binary graph … 0-7 -> bin 0 8-15 -> bin 1 … > bin 31 32 bins total Bin 31 N x N … … Binary graph

Bin 0 N x N … Sum graph 0 … … … … … … Sum graph 1 Bin 1 N x N Image N x N Quantization N x N … Sum graph, count the number of 1’s in the binary graph Result to a histogram … 0-7 -> bin 0 8-15 -> bin 1 … > bin 31 32 bins total 31 Bin 31 N x N … Sum graph 31 … Binary graph

the sequential algorithm calculate normalized histogram O( 𝑁 2 ) suppose using 32 bins, compute the task takes 32 * 𝑁 2 use the integral histogram, it takes 3 steps D B Use the integral histogram complexity Sum graph Orange = A – B – C + D C A

the parallel algorithm (thread level) We consider two methods for thread-level parallelization, depending on how the bin values are shared among threads 1(a) maintains a single shared histogram, whose updates are synchronized via atomic increment instructions 1(b) maintains a private histogram per thread, which is reduced to a global histogram later

the parallel algorithm (SIMD) vectorize bin update with 3 bins and 4-wide SIMD SIMD-lane-private bins prevents a collision between the 3rd and 4th data elements. vectorized bin update with 3 bins and 4-wide SIMD Since the vector width is four, there are four slots for each bin. 方块塔, it is distinguished by the colors we read the corresponding bin values using a gather instruction increment the bin values write the updated bin values using a scatter instruction reduce into one bin, sum up the private bins using scalar instructions The privatization prevents a collision between the 3rd and 4th data elements.

get integral histograms at each pixel in parallel apply preﬁx-sums to the rows of the histogram bins (horizontal cumulative sums) transpose the array and reapply prefix-sum obtain the integral histograms at each pixel Computation of the histogram up to location (x, y) Sum until the current pixel for example (x, y) is added

get integral histograms at each pixel in parallel prescan / parallel prefix sum operation up-sweep or reduce phase applied to an 8-element array The parallel prefix sum operation on the GPU consists of two phases: an upsweep打扫 (or reduce) phase and a down-sweep phase Up-sweep phase builds a balanced binary tree on the input data and performs one addition per node Scanning is done from the leaves to the root. (bottom up) In the down-sweep phase, the tree is traversed from root to the leaves partial sums from the upsweep phase are aggregated合计的 to obtain the final prefix summed array. Prescan requires only O(n) operations: 2∗(n−1) additions and (n−1) swaps.

get integral histograms at each pixel in parallel prescan / parallel preﬁx sum operation down sweep phase start by inserting zero at the root of the tree and on each step, each node at the current level passes its own value to its left child and the sum of its value and the former value of its left child to its right child //The algorithm performs O(n log2 n) addition operations. Naïve scan

get integral histograms at each pixel prescan / parallel preﬁx sum operation prescan requires only O(n) operations: 2∗(n−1) additions (n−1) swaps

Integral Histogram Comparison
sequential integral histogram

Integral Histogram Comparison
parallel integral histogram

References Dan Connors, “Exploring Computer Vision and Image Processing Algorithms in Teaching Parallel Programming,” Department of Electrical Engineering University of Colorado Denver. M. Chouchene, F. E. Sayadi, Y. Said, M. Atri, and R. Tourki, “Efﬁcient implementation of sobel edge detection algorithm on cpu, gpu and fpga,” International Journal of Advanced Media and Communication, vol. 5, pp. 105 – 117, April 2014. Poostchi, M., Palaniappan, K., Bunyak, F., Becchi, M., and Seetharaman, G., “Eﬃcient GPU implementation of the integral histogram,” in [Lecture Notes in Computer Science (ACCV Workshop on DeveloperCentered Computer Vision)], 7728(Part I), 266–278 (2012). Schrider, Christina Da-Wann, "Histogram-Based Template Matching Object Detection in Images With Varying Brightness and Constrast," B.S., Wright State University, 2005, thesis. W. Jung, J. Park, and J. Lee, “Versatile and scalable parallel histogram construction,” in PACT- 23, 2014.

The End

Parallel Computer Vision and Image Processing Algorithms

Similar presentations

Presentation on theme: "Parallel Computer Vision and Image Processing Algorithms"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Parallel Computer Vision and Image Processing Algorithms

Similar presentations

Presentation on theme: "Parallel Computer Vision and Image Processing Algorithms"— Presentation transcript:

Similar presentations

About project

Feedback