Accelerating image recognition on mobile devices using GPGPU

Slides:



Advertisements
Similar presentations
Computer Graphics Prof. Muhammad Saeed. 2 Hardware ( Graphic Cards ) II Hardware II Computer Graphics 1 August 2012.
Advertisements

Accelerators for HPC: Programming Models Accelerators for HPC: StreamIt on GPU High Performance Applications on Heterogeneous Windows Clusters
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
G30™ A 3D graphics accelerator for mobile devices Petri Nordlund CTO, Bitboys Oy.
GPU Virtualization Support in Cloud System Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer Science and Information.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
GRAPHICS AND COMPUTING GPUS Jehan-François Pâris
University of Michigan Electrical Engineering and Computer Science 1 Polymorphic Pipeline Array: A Flexible Multicore Accelerator with Virtualized Execution.
Emerging Trends in Computer Science Dr. Gurvinder Singh Reader, Deptt of Computer Science & Engineering, GNDU, Amritsar.
Weekly Report Start learning GPU Ph.D. Student: Leo Lee date: Sep. 18, 2009.
Accelerating Marching Cubes with Graphics Hardware Gunnar Johansson, Linköping University Hamish Carr, University College Dublin.
A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications From J. Fowers, G. Brown, P. Cooke, and G. Stitt, University.
Accelerating Machine Learning Applications on Graphics Processors Narayanan Sundaram and Bryan Catanzaro Presented by Narayanan Sundaram.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
MACHINE VISION GROUP Head-tracking virtual 3-D display for mobile devices Miguel Bordallo López*, Jari Hannuksela*, Olli Silvén* and Lixin Fan**, * University.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
GPGPU platforms GP - General Purpose computation using GPU
MACHINE VISION GROUP Multimodal sensing-based camera applications Miguel Bordallo 1, Jari Hannuksela 1, Olli Silvén 1 and Markku Vehviläinen 2 1 University.
Copyright © Tekes drawElements Solution for Testing the Graphics of Embedded Systems DM.
CSU0021 Computer Graphics © Chun-Fa Chang CSU0021 Computer Graphics September 10, 2014.
MACHINE VISION GROUP Graphics hardware accelerated panorama builder for mobile phones Miguel Bordallo López*, Jari Hannuksela*, Olli Silvén* and Markku.
GPU Programming Robert Hero Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads.
Enhancing GPU for Scientific Computing Some thoughts.
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
MACHINE VISION GROUP GPGPU-based surface inspection from structured white light Miguel Bordallo 1, Karri Niemelä 2, Olli Silvén 1 1 Center for Machine.
Computer Graphics Graphics Hardware
BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.
GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.
By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.
Chris Kerkhoff Matthew Sullivan 10/16/2009.  Shaders are simple programs that describe the traits of either a vertex or a pixel.  Shaders replace a.
© 2009 Nokia V1-OpenCLEnbeddedProfilePresentation.ppt / / JyrkiLeskelä 1 OpenCL Embedded Profile Presentation for Multicore Expo 16 March 2009.
Interactive Time-Dependent Tone Mapping Using Programmable Graphics Hardware Nolan GoodnightGreg HumphreysCliff WoolleyRui Wang University of Virginia.
Implementing Codesign in Xilinx Virtex II Pro Betim Çiço, Hergys Rexha Department of Informatics Engineering Faculty of Information Technologies Polytechnic.
Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.
Advanced Computer Architecture 0 Lecture # 1 Introduction by Husnain Sherazi.
GPU Architecture and Programming
Tone Mapping on GPUs Cliff Woolley University of Virginia Slides courtesy Nolan Goodnight.
A Closer Look At GPUs By Kayvon Fatahalian and Mike Houston Presented by Richard Stocker.
CS662 Computer Graphics Game Technologies Jim X. Chen, Ph.D. Computer Science Department George Mason University.
Designing for energy-efficient vision-based interactivity on mobile devices Miguel Bordallo Center for Machine Vision Research.
GPU-Accelerated Computing and Case-Based Reasoning Yanzhi Ren, Jiadi Yu, Yingying Chen Department of Electrical and Computer Engineering, Stevens Institute.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
May 8, 2007Farid Harhad and Alaa Shams CS7080 Overview of the GPU Architecture CS7080 Final Class Project Supervised by: Dr. Elias Khalaf By: Farid Harhad.
CDVS on mobile GPUs MPEG 112 Warsaw, July Our Challenge CDVS on mobile GPUs  Compute CDVS descriptor from a stream video continuously  Make.
GPUs – Graphics Processing Units Applications in Graphics Processing and Beyond COSC 3P93 – Parallel ComputingMatt Peskett.
Implementation and Optimization of SIFT on a OpenCL GPU Final Project 5/5/2010 Guy-Richard Kayombya.
David Angulo Rubio FAMU CIS GradStudent. Introduction  GPU(Graphics Processing Unit) on video cards has evolved during the last years. They have become.
From Turing Machine to Global Illumination Chun-Fa Chang National Taiwan Normal University.
MACHINE VISION GROUP MOBILE FEATURE-CLOUD PANORAMA CONSTRUCTION FOR IMAGE RECOGNITION APPLICATIONS Miguel Bordallo, Jari Hannuksela, Olli silvén Machine.
GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.
GPU Computing for GIS James Mower Department of Geography and Planning University at Albany.
An Introduction to the Cg Shading Language Marco Leon Brandeis University Computer Science Department.
COMP 175 | COMPUTER GRAPHICS Remco Chang1/XX13 – GLSL Lecture 13: OpenGL Shading Language (GLSL) COMP 175: Computer Graphics April 12, 2016.
Image Fusion In Real-time, on a PC. Goals Interactive display of volume data in 3D –Allow more than one data set –Allow fusion of different modalities.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
Computer Graphics Graphics Hardware
GPU Architecture and Its Application
Dynamo: A Runtime Codesign Environment
Advanced Graphics Algorithms Ying Zhu Georgia State University
EMBEDDED SYSTEMS
Texas Instruments TDA2x and Vision SDK
Graphics Processing Unit
Dingding Liu* Yingen Xiong† Linda Shapiro* Kari Pulli†
Graphics Processing Unit
Computer Graphics Graphics Hardware
Graphics Processing Unit
CIS 6930: Chip Multiprocessor: GPU Architecture and Programming
Presentation transcript:

Accelerating image recognition on mobile devices using GPGPU Miguel Bordallo1, Henri Nykänen2, Jari Hannuksela1, Olli Silvén1 and Markku Vehviläinen3 1 University of Oulu, Finland 2 Visidon Ltd. Oulu, Finland 3 Nokia Research Center, Tampere, Finland Jari Hannuksela, Olli Silvén Machine Vision Group, Infotech Oulu Department of Electrical and Information Engineeering University of Oulu, Finland

Contents Introduction Mobile Image Recognition Local Binary Pattern Graphics processor as a computing engine GPU accelerated image recognition LBP Fragment Shader implementation Image preprocessing Experiments and results Speed Power Consumptions

Motivation Face detection and recognition is a key component of future multimodal user interfaces Mobile computation power still not harnessed properly for real-time computer vision High demand computations compromise battery life. Need for energy and computationally efficient solutions

Face analysis using local binary patterns Face analysis is one of the major challenges in computer vision LBP method has already been adopted by many leading scientists Excellent results in face recognition and authentication, face detection, facial expression recognition, gender classification

Local Binary Pattern

GPU as a computing engine GPU can be treated a an independent entity Newer phones include a GPU chipset OpenGL ES as a highly optimized and attractive accelerator interface Emerging platforms (OpenCL EP) will facilitate using the GPU as a computing resource Compatible data formats for graphics and camera sub-systems desirable

Fixed pipeline (OpenGL ES 1. 1) vs. programmable pipeline (OpenGL ES 2

Stream processing (OpenGL) vs. shared memory processing (CUDA)

OpenCL (Embedded Profile) Emerging platforms will offer needed flexibility OpenCL Embedded Profile is a subset of OpenCL Supports data and task parallel programming models Code executed concurrently on CPU & GPU (& DSP) Other current and future resources are compatible Easier programming in a heterogeneous processor environment High parallelization on image processing computations -> High efficiency

GPU assisted face analysis process

GPU-accelerated image recognition Open GL ES 2.0: Image features (LBP,...) extraction: Image preprocessing Image scaling Displaying C code: Camera control Classification c

LBP fragment shader implementation Two versions: Version 1: calculates LBP map in one grayscale channel Version 2: calculates 4 LBP maps in RGBA channels Access the image via texture lookup Fetch the selected picture pixel Fetch the neighbours values Compute binary vector Multiply by weighting factor

Preprocessing Create quad Render each piece in one channel Divide texture & Convert to grayscale

Experiments setup OMAP 3 family (OMAP3530) 3 set-ups: ARM Cortex A8 CPU Power VRSGX535 GPU 3 set-ups: Beagleboard revision 3 Zoom AM3517EVM (TI Sitara) Nokia N900

Processing times: LBP extraction Size GPUv1 GPUv2 CPU CPU& GPUv1 CPU& GPUv2 1024x1024 232ms 180ms 100ms 116ms 90ms 512x512 76ms 46ms 25ms 37ms 23ms 64x64 2ms 1,5ms 0,4ms 1ms 0,2ms Computing LBP in four channels (version 2) faster than computing in one CPU faster than GPU Concurrent execution of algorithms in GPU + CPU increases performance

Processing times: Preprocessing Size GPU CPU CPU &GPU 1024x1024 35ms 100ms 54ms 512x512 10ms 25ms 15ms 64x64 0,2ms 0,4ms GPU outperforms CPU in pixelwise simple operations (scaling + interpolation) Concurrent execution of algorithms in GPU + CPU slower than GPU alone due to data transfers

Speed (II): Preprocessing Size GPU CPU CPU&GPU 1024x1024 35ms 100ms 54ms 512x512 10ms 25ms 15ms 64x64 0,2ms 0,4ms

Speed (II): Preprocessing Size GPU CPU GPU preprocessing & CPU LBP extraction 1024x1024 215ms 205ms 142ms 512x512 56ms 50ms 40ms 64x64 1,8ms 1ms 0,8ms

Power and Energy consumptions Operation GPU CPU Preprocesing 27mJ 19mJ LBP 5,3mJ 10mJ Combined algorithm 32,3mJ 28mJ Power consumption of GPU and CPU is independent CPU – 190mW GPU – 110mW-130mW (increases with image size) Energy consumption depends on processing time GPU has smaller energy per operation.

Summary GPUs can be used as a general purpose procesors New platforms will offer more efficiency and flexibility Not optimized interfaces include excesive overheads

Future directions Implementation of classifier Implementations in OpenCL Multi-scale LBP Implementation of other feature extraction

Thank you! Any questions??? Thanks to Texas Instruments for the donation of the Hardware