Presentation is loading. Please wait.

Presentation is loading. Please wait.

GPU acceleration in Matlab Jan Kamenický UTIA Friday seminar9.11.2012.

Similar presentations


Presentation on theme: "GPU acceleration in Matlab Jan Kamenický UTIA Friday seminar9.11.2012."— Presentation transcript:

1 GPU acceleration in Matlab Jan Kamenický UTIA Friday seminar

2 GPU acceleration CPU – fast – general-purpose GPU – highly parallel – handles specific tasks with large amount of data – memory transfers needed

3 GPU acceleration in Matlab Build-in functions – many Matlab functions support GPU acceleration natively arrayfun – specific element-wise processing CUDA kernels – write “.cu” files – compile to “.ptx” (parallel thread execution) – run using feval

4 Prerequisites Matlab 2010b or newer Parallel Computing Toolbox ver

5 Prerequisites >> ver MATLAB Version (R2011b) MATLAB License Number: XXXXXX Operating System: Microsoft Windows 7 Version 6.1 (Build 7601: Service Pack 1) Java VM Version: Java 1.6.0_17-b04 with Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM mixed mode MATLAB Version 7.13 (R2011b) Simulink Version 7.8 (R2011b) Computer Vision System Toolbox Version 4.1 (R2011b) Curve Fitting Toolbox Version 3.2 (R2011b) DSP System Toolbox Version 8.1 (R2011b) Data Acquisition Toolbox Version 3.0 (R2011b) Filter Design HDL Coder Version 2.9 (R2011b) Fixed-Point Toolbox Version 3.4 (R2011b) Global Optimization Toolbox Version 3.2 (R2011b) Image Acquisition Toolbox Version 4.2 (R2011b) Image Processing Toolbox Version 7.3 (R2011b) MATLAB Compiler Version 4.16 (R2011b) MATLAB Distributed Computing Server Version 5.2 (R2011b) Neural Network Toolbox Version (R2011b) Optimization Toolbox Version 6.1 (R2011b) Parallel Computing Toolbox Version 5.2 (R2011b) Partial Differential Equation Toolbox Version (R2011b) Signal Processing Toolbox Version 6.16 (R2011b) Simulink 3D Animation Version 6.0 (R2011b) Statistics Toolbox Version 7.6 (R2011b) Symbolic Math Toolbox Version 5.7 (R2011b) Wavelet Toolbox Version 4.8 (R2011b)

6 Prerequisites >> ver MATLAB Version (R2011b) MATLAB License Number: XXXXXX Operating System: Microsoft Windows 7 Version 6.1 (Build 7601: Service Pack 1) Java VM Version: Java 1.6.0_17-b04 with Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM mixed mode MATLAB Version 7.13 (R2011b) Simulink Version 7.8 (R2011b) Computer Vision System Toolbox Version 4.1 (R2011b) Curve Fitting Toolbox Version 3.2 (R2011b) DSP System Toolbox Version 8.1 (R2011b) Data Acquisition Toolbox Version 3.0 (R2011b) Filter Design HDL Coder Version 2.9 (R2011b) Fixed-Point Toolbox Version 3.4 (R2011b) Global Optimization Toolbox Version 3.2 (R2011b) Image Acquisition Toolbox Version 4.2 (R2011b) Image Processing Toolbox Version 7.3 (R2011b) MATLAB Compiler Version 4.16 (R2011b) MATLAB Distributed Computing Server Version 5.2 (R2011b) Neural Network Toolbox Version (R2011b) Optimization Toolbox Version 6.1 (R2011b) Parallel Computing Toolbox Version 5.2 (R2011b) Partial Differential Equation Toolbox Version (R2011b) Signal Processing Toolbox Version 6.16 (R2011b) Simulink 3D Animation Version 6.0 (R2011b) Statistics Toolbox Version 7.6 (R2011b) Symbolic Math Toolbox Version 5.7 (R2011b) Wavelet Toolbox Version 4.8 (R2011b)

7 Prerequisites Matlab 2010b or newer Parallel Computing Toolbox ver NVIDIA GPU with CUDA version 1.3 or higher gpuDevice

8 Prerequisites >> gpuDevice ans = parallel.gpu.CUDADevice handle Package: parallel.gpu Properties: Name: 'GeForce GTX 285' Index: 1 ComputeCapability: '1.3' SupportsDouble: 1 DriverVersion: 5 MaxThreadsPerBlock: 512 MaxShmemPerBlock: MaxThreadBlockSize: [ ] MaxGridSize: [ ] SIMDWidth: 32 TotalMemory: e+009 FreeMemory: e+009 MultiprocessorCount: 30 ClockRateKHz: ComputeMode: 'Default' GPUOverlapsTransfers: 1 KernelExecutionTimeout: 1 CanMapHostMemory: 1 DeviceSupported: 1 DeviceSelected: 1 Methods, Events, Superclasses

9 Prerequisites >> gpuDevice ans = parallel.gpu.CUDADevice handle Package: parallel.gpu Properties: Name: 'GeForce GTX 285' Index: 1 ComputeCapability: '1.3' SupportsDouble: 1 DriverVersion: 5 MaxThreadsPerBlock: 512 MaxShmemPerBlock: MaxThreadBlockSize: [ ] MaxGridSize: [ ] SIMDWidth: 32 TotalMemory: e+009 FreeMemory: e+009 MultiprocessorCount: 30 ClockRateKHz: ComputeMode: 'Default' GPUOverlapsTransfers: 1 KernelExecutionTimeout: 1 CanMapHostMemory: 1 DeviceSupported: 1 DeviceSelected: 1 Methods, Events, Superclasses

10 Basic usage Send data to GPU – either allocate there or transfer from workspace Run Matlab functions – GPU acceleration is used automatically Retrieve the output data

11 GPUArray class parallel.gpu.GPUArray – main data class for GPU computations – stored in the GPU memory – create directly using static methods – copy from existing data gpuArray(img) zerosnaneyerandlinspace onestruecolonrandilogspace inffalserandn

12 GPUArray class Supported data types: (u)int8, (u)int16, (u)int32, (u)int64, single, double, logical – determine the type using classUnderlying(gpuVar) Retrieve the data using workspaceVar = gather(gpuVar)

13 GPU accelerated Matlab functions (2012b) methods(‘parallel.gpu.GPUArray’)

14 GPU accelerated Matlab functions (2012b) abscastdotgeissparsenormestsinh acoscatdoublegtisvectornotsize acoshceileighorzcatkronnum2strsort acotcholepshypotldividenumelsprintf acothcircshifteqifftlepermssqrt acscclassUnderlyingerfifft2lengthpermutesqueeze acschcolonerfcifftnlogplot (and related)std allcomplexerfcinvifftshiftlog10plussub2ind angleconderfcximaglog1ppow2subsasgn anyconjerfinvind2sublog2powersubsindex arrayfunconvexpint16logicalprodsubsref asecconv2expm1int2strltqrsum asechconvnfftint32luranksvd asincosfft2int64mat2strrdividetan asinhcoshfftnint8maxrealtanh atancotfftshiftinvmeanreallogtimes atan2cothfilteripermutemeshgridrealpowtrace atanhcovfilter2iscolumnminrealsqrttranspose betacrossfindisemptyminusremtril betalncscfixisequalmldividerepmattriu bitandcschfliplrisequalnmodreshapeuint16 bitcmpctransposeflipudisfinitempowerrot90uint32 bitgetcumprodflipdimisinfmrdividerounduint64 bitorcumsumfloorislogicalmtimessecuint8 bitsetdetfprintfismatrixndgridsechuminus bitshiftdiagfullisnanndimsshiftdimuplus bitxordiffgammaisrealnesignvar blkdiagdispgammalnisrownnzsinvertcat bsxfundisplaygatherissortednormsingle

15 Simple example Solve system of linear equations (Ax = b) A = gpuArray(A); b = gpuArray(b); x = A\b; x = gather(x);

16 Simple example Compute convolution using FFT img = gpuArray(img); msk = padarray(msk,size(img)-size(msk),0,'post'); msk = gpuArray(msk); I = fft2(img); M = fft2(msk,size(img,1),size(img,2)); res = real(ifft2(I.*M)); res = gather(res); M = fft2(msk);

17 Linear system solution benchmark

18 Convolution benchmark

19 Profiling Before optimizing (trying to use GPU) locate promising parts of code like – custom code consuming the majority of time – build-in functions that support GPUArray (consuming the majority of time) – large input/output data, simple data types Test the speed afterwards GPU code cannot be profiled

20 Profiling


Download ppt "GPU acceleration in Matlab Jan Kamenický UTIA Friday seminar9.11.2012."

Similar presentations


Ads by Google