Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2009 Nokia V1-OpenCLEnbeddedProfilePresentation.ppt / 2009-02-26 / JyrkiLeskelä 1 OpenCL Embedded Profile Presentation for Multicore Expo 16 March 2009.

Similar presentations


Presentation on theme: "© 2009 Nokia V1-OpenCLEnbeddedProfilePresentation.ppt / 2009-02-26 / JyrkiLeskelä 1 OpenCL Embedded Profile Presentation for Multicore Expo 16 March 2009."— Presentation transcript:

1 © 2009 Nokia V1-OpenCLEnbeddedProfilePresentation.ppt / 2009-02-26 / JyrkiLeskelä 1 OpenCL Embedded Profile Presentation for Multicore Expo 16 March 2009 V0.3 Improved draft – Still need some work Kari Pulli Nokia Research Center Jyrki Leskelä Nokia Devices R&D / Technology Renewal

2 © 2009 Nokia V1-OpenCLEnbeddedProfilePresentation.ppt / 2009-02-26 / JyrkiLeskelä 2 OpenCL Embedded Profile - Basics

3 © 2009 Nokia V1-OpenCLEnbeddedProfilePresentation.ppt / 2009-02-26 / JyrkiLeskelä 3 OpenCL Relation to Khronos Embedded Ecosystem

4 © 2009 Nokia V1-OpenCLEnbeddedProfilePresentation.ppt / 2009-02-26 / JyrkiLeskelä 4 OpenCL 1.0 Embedded Profile One-Slider

5 © 2009 Nokia V1-OpenCLEnbeddedProfilePresentation.ppt / 2009-02-26 / JyrkiLeskelä 5 Embedded Profile Main Differencies The embedded profile is defined to be a subset for each version of OpenCL: Online compiler is optional No 64-bit integers, or integer vectors Float 2D/3D images can only be used with nearest neighbor sampling Macro __EMBEDDED_PROFILE__ is added in the language and CL_PLATFORM_PROFILE capability will return the string EMBEDDED_PROFILE if the OpenCL implementation supports the embedded profile only. Minimum requirements for constant buffer size, object allocation size, constant argument count and local memory size are scaled down. Image support and floating point support is aligned with OpenGL ES 2.0 texture requirements The extensions of full profile can be applied to embedded profile

6 © 2009 Nokia V1-OpenCLEnbeddedProfilePresentation.ppt / 2009-02-26 / JyrkiLeskelä 6 Floating Point Numbers in Embedded Profile INF and NAN values for floats are not mandated Accuracy requirements of some single precision floating-point operations are relaxed from full profile: x / y <= 3 ulp exp <= 4 ulp log <= 4 ulp Float add, sub, mul, mad can be rounded to zero resulting an error <= 1 ulp due to strict HW area. Denormalized numbers for the half float data type can be flushed to zero. The precision of conversions from normalized integers is <= 2 ulp for the embedded profile (instead of <= 1.5 ulp)

7 © 2009 Nokia V1-OpenCLEnbeddedProfilePresentation.ppt / 2009-02-26 / JyrkiLeskelä 7 Image Support in Embedded Profile Image support is an optional feature within an OpenCL device If Images are supported, the minimum requirements for the supported image capabilities are lowered to the level of OpenGL ES 2.0 textures Kernel must be able to read >= 8 simultaneous image objects Kernel must be able to write >= 1 simultaneous image objects Width and height of 2D image >= 2048 Number of samplers >= 8 Image formats are similar to corresponding OpenGL ES 2.0 texture formats Support for 3D images is optional for embedded implementations

8 © 2009 Nokia V1-OpenCLEnbeddedProfilePresentation.ppt / 2009-02-26 / JyrkiLeskelä 8 Potential Mobile Device Use-Cases Image post-processing and enhancement Image editing software Compatibility for devices lacking high-end imaging HW Machine vision, Local media search, Augmented reality Support emerging new coding schemes quickly For example web-originated media codecs Streaming math/algorithm libraries Physics modeling Gaming engines and WOW effects

9 © 2009 Nokia V1-OpenCLEnbeddedProfilePresentation.ppt / 2009-02-26 / JyrkiLeskelä 9 Potential Benefits for Mobile Devices Easier programming in a heterogeneous processor environment Instead of learning different programming methods for CPU, GPU, DSP OpenCL framework handles also event queuing Code developed once will run with future hardware If the application conforms to the specification, it will run OpenCL computing model will be relatively easy to virtualize Area and energy constrained embedded devices Computing power of each computing device close to ”sweet spot” Allocation of the workload to multiple computing devices is valuable

10 © 2009 Nokia V1-OpenCLEnbeddedProfilePresentation.ppt / 2009-02-26 / JyrkiLeskelä 10 Example Case 1: Split computation

11 © 2009 Nokia V1-OpenCLEnbeddedProfilePresentation.ppt / 2009-02-26 / JyrkiLeskelä 11 Split computation: Image Post Processing CPU GPU Host Application CL API Calls Camera Image OpenCL Post- Processing CL Buffer CL Buffer … Render

12 © 2009 Nokia V1-OpenCLEnbeddedProfilePresentation.ppt / 2009-02-26 / JyrkiLeskelä 12 Image Post-Processing Kernel Program __kernel void convolution( _global const uchar4 *srcdata, _global uchar4 *destdata, _global float *kernel, float kernel_multiplier, float kernel_bias, int kernel_dim ) { int x = get_global_id(0), y = get_global_id(1); int sizex = get_global_size( 0 ), sizey = get_global_size( 1 ); int half_kernel = kernel_dim / 2; uint4 sum; for( int j = y-half_kernel, kj = 0; j <= y+half_kernel; j++, kj++ ) { if( ( j >= 0 ) && ( j <= sizey ) ) { for( int i = x-half_kernel, ki = 0; i <= x+half_kernel; i++, ki++ ) { if( ( i >= 0 ) && ( i <= sizex ) ) { sum += srcdata[ j * sizex + i ] * kernel[ kj * kernel_dim + ki ]; } sum = sum * kernel_multiplier + kernel_bias; destdata[ y * sizex + x ] = convert_uchar4_sat(sum); }

13 © 2009 Nokia V1-OpenCLEnbeddedProfilePresentation.ppt / 2009-02-26 / JyrkiLeskelä 13 Split computation: Speedup t cpu is the time to process the task with only CPU, t gpu is the time to process the task with only GPU and t gpuif is the time to transfer the data between CPU and GPU (the transfer is modeled to be CPU bound). In this case, the speed-optimal workload split between CPU and GPU would yield the following execution time: Example: t gpu = k t cpu, k є 0.5 … 1.5 t gpuif = 0.1 t cpu Comparison of total execution times:

14 © 2009 Nokia V1-OpenCLEnbeddedProfilePresentation.ppt / 2009-02-26 / JyrkiLeskelä 14 Split computation: Energy efficiency t cpu, t gpu and t gpuif from the previous slide. p cpu, p gpu and p gpuif are the average battery power drain by CPU execution, GPU execution and data transfer between CPU and GPU respectively. p split is the average power drain when the computation is time-optimally split to between CPU and GPU. c split is the corresponding battery capacity as a product of power and time. Example: t gpu = k t cpu, k є 0.5…1.5 t gpuif = 0.1 t cpu p gpu = 0.5 p cpu p gpuif = 0.1 p cpu Total consumption of battery capacity:

15 © 2009 Nokia V1-OpenCLEnbeddedProfilePresentation.ppt / 2009-02-26 / JyrkiLeskelä 15 More Example Cases

16 © 2009 Nokia V1-OpenCLEnbeddedProfilePresentation.ppt / 2009-02-26 / JyrkiLeskelä 16 DSP CPU GPU Pipelining: Mixing computation and graphics OpenCL Fractal Anim. Texture OpenGL ES 2.0 Rendering Host Application CL API Calls GL API Calls GL Renderbuffer CL Buffer GL Texture CL Buffer

17 © 2009 Nokia V1-OpenCLEnbeddedProfilePresentation.ppt / 2009-02-26 / JyrkiLeskelä 17 Multimedia Frameworks: OpenMAX environment More portability by using OpenCL in some hotspots Diagram Copyright © 2009 Khronos Group

18 © 2009 Nokia V1-OpenCLEnbeddedProfilePresentation.ppt / 2009-02-26 / JyrkiLeskelä 18 Summary

19 © 2009 Nokia V1-OpenCLEnbeddedProfilePresentation.ppt / 2009-02-26 / JyrkiLeskelä 19 Summary OpenCL 1.0 Embedded Profile is a subset of the full profile Not an ”ES” specification of its own Easier programming of heterogeneous multi-processor Fast multiprocessor code without portability hassle Speedups and energy efficiency via parallelism Parallelize a uniform task to different processors Split pipeline stages to different processors

20 © 2009 Nokia V1-OpenCLEnbeddedProfilePresentation.ppt / 2009-02-26 / JyrkiLeskelä 20 Demo

21 © 2009 Nokia V1-OpenCLEnbeddedProfilePresentation.ppt / 2009-02-26 / JyrkiLeskelä 21 Demo: Magnification Lense Internal development environment for evaluating the OpenCL Embedded Profile Early pilot version only No conformance test coverage at the moment Runs on N810 (OMAP2420 CPU) Zoom MDK (OMAP3430 CPU+SIMD+DSP) The lens effect is a mapping of the original image f(x,y) into modified image g(x,y) as piecewise continuous function where R o and R i are the outer and inner boundaries of the lens frame, (x c, y c ) is the center point of the lens, and M is the magnification factor in the center area of the lens.


Download ppt "© 2009 Nokia V1-OpenCLEnbeddedProfilePresentation.ppt / 2009-02-26 / JyrkiLeskelä 1 OpenCL Embedded Profile Presentation for Multicore Expo 16 March 2009."

Similar presentations


Ads by Google