Presentation is loading. Please wait.

Presentation is loading. Please wait.

Accelerating image recognition on mobile devices using GPGPU

Similar presentations


Presentation on theme: "Accelerating image recognition on mobile devices using GPGPU"— Presentation transcript:

1 Accelerating image recognition on mobile devices using GPGPU
Miguel Bordallo1, Henri Nykänen2, Jari Hannuksela1, Olli Silvén1 and Markku Vehviläinen3 1 University of Oulu, Finland 2 Visidon Ltd. Oulu, Finland 3 Nokia Research Center, Tampere, Finland Jari Hannuksela, Olli Silvén Machine Vision Group, Infotech Oulu Department of Electrical and Information Engineeering University of Oulu, Finland

2 Contents Introduction Mobile Image Recognition Local Binary Pattern
Graphics processor as a computing engine GPU accelerated image recognition LBP Fragment Shader implementation Image preprocessing Experiments and results Speed Power Consumptions

3 Motivation Face detection and recognition is a key component of future multimodal user interfaces Mobile computation power still not harnessed properly for real-time computer vision High demand computations compromise battery life. Need for energy and computationally efficient solutions

4 Face analysis using local binary patterns
Face analysis is one of the major challenges in computer vision LBP method has already been adopted by many leading scientists Excellent results in face recognition and authentication, face detection, facial expression recognition, gender classification

5 Local Binary Pattern

6 GPU as a computing engine
GPU can be treated a an independent entity Newer phones include a GPU chipset OpenGL ES as a highly optimized and attractive accelerator interface Emerging platforms (OpenCL EP) will facilitate using the GPU as a computing resource Compatible data formats for graphics and camera sub-systems desirable

7 Fixed pipeline (OpenGL ES 1. 1) vs. programmable pipeline (OpenGL ES 2

8 Stream processing (OpenGL) vs. shared memory processing (CUDA)

9 OpenCL (Embedded Profile)
Emerging platforms will offer needed flexibility OpenCL Embedded Profile is a subset of OpenCL Supports data and task parallel programming models Code executed concurrently on CPU & GPU (& DSP) Other current and future resources are compatible Easier programming in a heterogeneous processor environment High parallelization on image processing computations -> High efficiency

10 GPU assisted face analysis process

11 GPU-accelerated image recognition
Open GL ES 2.0: Image features (LBP,...) extraction: Image preprocessing Image scaling Displaying C code: Camera control Classification c

12 LBP fragment shader implementation
Two versions: Version 1: calculates LBP map in one grayscale channel Version 2: calculates 4 LBP maps in RGBA channels Access the image via texture lookup Fetch the selected picture pixel Fetch the neighbours values Compute binary vector Multiply by weighting factor

13 Preprocessing Create quad Render each piece in one channel
Divide texture & Convert to grayscale

14 Experiments setup OMAP 3 family (OMAP3530) 3 set-ups:
ARM Cortex A8 CPU Power VRSGX535 GPU 3 set-ups: Beagleboard revision 3 Zoom AM3517EVM (TI Sitara) Nokia N900

15 Processing times: LBP extraction
Size GPUv1 GPUv2 CPU CPU& GPUv1 CPU& GPUv2 1024x1024 232ms 180ms 100ms 116ms 90ms 512x512 76ms 46ms 25ms 37ms 23ms 64x64 2ms 1,5ms 0,4ms 1ms 0,2ms Computing LBP in four channels (version 2) faster than computing in one CPU faster than GPU Concurrent execution of algorithms in GPU + CPU increases performance

16 Processing times: Preprocessing
Size GPU CPU CPU &GPU 1024x1024 35ms 100ms 54ms 512x512 10ms 25ms 15ms 64x64 0,2ms 0,4ms GPU outperforms CPU in pixelwise simple operations (scaling + interpolation) Concurrent execution of algorithms in GPU + CPU slower than GPU alone due to data transfers

17 Speed (II): Preprocessing
Size GPU CPU CPU&GPU 1024x1024 35ms 100ms 54ms 512x512 10ms 25ms 15ms 64x64 0,2ms 0,4ms

18 Speed (II): Preprocessing
Size GPU CPU GPU preprocessing & CPU LBP extraction 1024x1024 215ms 205ms 142ms 512x512 56ms 50ms 40ms 64x64 1,8ms 1ms 0,8ms

19 Power and Energy consumptions
Operation GPU CPU Preprocesing 27mJ 19mJ LBP 5,3mJ 10mJ Combined algorithm 32,3mJ 28mJ Power consumption of GPU and CPU is independent CPU – 190mW GPU – 110mW-130mW (increases with image size) Energy consumption depends on processing time GPU has smaller energy per operation.

20 Summary GPUs can be used as a general purpose procesors
New platforms will offer more efficiency and flexibility Not optimized interfaces include excesive overheads

21 Future directions Implementation of classifier
Implementations in OpenCL Multi-scale LBP Implementation of other feature extraction

22 Thank you! Any questions???
Thanks to Texas Instruments for the donation of the Hardware


Download ppt "Accelerating image recognition on mobile devices using GPGPU"

Similar presentations


Ads by Google