Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

Similar presentations


Presentation on theme: "Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007."— Presentation transcript:

1 Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007

2 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 2 Performance Libraries: Intel® Math Kernel Library (MKL) Agenda Performance Features The Library Sections BLAS LAPACK* DFTs VML VSL SciMark 2.0 Optimization Case Study (from Henry Gabb) SciMark 2.0 overview Tuning with the Intel compiler Tuning with the Intel Math Kernel Library

3 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 3 Performance Libraries: Intel® Math Kernel Library (MKL) Intel® Math Kernel Library Purpose Performance, Performance, Performance! Intel’s engineering, scientific, and financial math library Addresses: Solvers (BLAS, LAPACK) Eigenvector/eigenvalue solvers (BLAS, LAPACK) Some quantum chemistry needs (dgemm) PDEs, signal processing, seismic, solid-state physics (FFTs) General scientific, financial [vector transcendental functions (VML) and vector random number generators (VSL)] Tuned for Intel® processors – current and future

4 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 4 Performance Libraries: Intel® Math Kernel Library (MKL) Intel® Math Kernel Library Purpose – Don’ts But don’t use Intel® Math Kernel (Intel® MKL) on … Don’t use Intel® MKL on “small” counts Don’t call vector math functions on small n X’ Y’ Z’ W’ XYZWXYZW = 4x4 Transformation matrix Geometric Transformation But you could use Intel ® Performance Primitives

5 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 5 Performance Libraries: Intel® Math Kernel Library (MKL) Intel® Math Kernel Library Environment Support 32-bit and 64-bit Intel® processors Large set of examples and tests Extensive documentation Windows*Linux* CompilersIntel, MicrosoftIntel, Gnu Libraries.dll,.lib.a,.so

6 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 6 Performance Libraries: Intel® Math Kernel Library (MKL) Resource Limited Optimization The goal of all optimization is maximum speed Resource limited optimization – exhaust one or more resource of system: CPU: Register use, FP units Cache: Keep data in cache as long as possible; deal with cache interleaving TLBs: Maximally use data on each page Memory bandwidth: Minimally access memory Computer: Use all the processors/cores available using threading System: Use all the nodes available (cluster software)

7 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 7 Performance Libraries: Intel® Math Kernel Library (MKL) Threading Most of Intel® Math Kernel Library could be threaded but: Limited resource is memory bandwidth Threading level 1 and level 2 BLAS are mostly ineffective ( O(n) ) There are numerous opportunities for threading: Level 3 BLAS ( O(n 3 ) ) LAPACK* ( O(n 3 ) ) FFTs ( O(n log n ) ) VML, VSL ? depends on processor and function All threading is via OpenMP* All Intel MKL is designed and compiled for thread safety

8 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 8 Performance Libraries: Intel® Math Kernel Library (MKL) SciMark 2.0 Produced by the National Institute of Standards and Technology ANSI C and Java versions available Five floating-point-intensive kernels FFT: Compute a complex 1D FFT SOR: Jacobi successive over-relaxation in 2D MC: Compute  by Monte Carlo integration MV: Sparse matrix-vector multiplication LU: Dense matrix LU factorization

9 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 9 Performance Libraries: Intel® Math Kernel Library (MKL) SciMark 2.0 Problem Sizes Benchmark Problem Size SmallLarge FFTN = 1024N = 1048576 SOR100 x 1001000 x 1000 MC Problem size not fixed, no distinction between small and large problems MV N = 1000 NZ = 5000 N = 100000 NZ = 1000000 LU100 x 1001000 x 1000

10 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 10 Performance Libraries: Intel® Math Kernel Library (MKL) Benchmark System Hardware CPU (dual-processor system)3.6 GHz Xeon (2 MB L2 cache) EM64T MotherboardIntel Server Board SE7520AF2 Memory512 MB DDR2 BIOS VersionP06 Adjacent Cache Line PrefetchON Hardware PrefetchON Hyper-Threading TechnologyOFF Software Operating systemRed Hat Enterprise Linux AS3 Linux kernel2.4.21-20.EL #1 SMP Intel C++ Compiler for Linux8.1 (l_cce_pc_8.1.024) Intel Cluster MKL7.2 (l_cluster_mkl_7.2.008) GNU C Compilergcc 3.2.3

11 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 11 Performance Libraries: Intel® Math Kernel Library (MKL) GNU Performance Baseline Aggressive optimization significantly improves performance relative to the default optimization level. The following gcc options were used to establish baseline performance: –O3 –march=nocona –ffast-math –mfpmath=sse

12 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 12 Performance Libraries: Intel® Math Kernel Library (MKL) Intel C++ Compiler for Linux Performance Automatic vectorization Streaming SIMD Extensions 3 IPO and PGO Automatic parallelization and OpenMP support Automatic CPU dispatch Much more... Compatibility Source and object compatible with gcc and g++ Supports GNU inline ASM ANSI/ISO C/C++ standards compliance Conforms to the C++ ABI standard Integrated with the Eclipse IDE

13 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 13 Performance Libraries: Intel® Math Kernel Library (MKL) Tuning SciMark 2.0 with the Intel Compiler The Intel C++ Compiler for Linux improves SciMark 2.0 performance relative to the GNU baseline. Intel compiler options: –O3 –xP –ipo –fno-alias.

14 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 14 Performance Libraries: Intel® Math Kernel Library (MKL) Intel® Math Kernel Library Contents BLAS BLAS (Basic Linear Algebra Subroutines) Level 1 BLAS – vector-vector operations 15 function types 48 functions Level 2 BLAS – matrix-vector operations 26 function types 66 functions Level 3 BLAS – matrix-matrix operations 9 function types 30 functions Extended BLAS – level 1 BLAS for sparse vectors 8 function types 24 functions

15 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 15 Performance Libraries: Intel® Math Kernel Library (MKL) Intel® Math Kernel Library Contents LAPACK LAPACK (linear algebra package) Solvers and eigensolvers. Many hundreds of routines total! There are more than 1000 total user callable and support routines DFTs (Discrete Fourier transforms) Mixed radix, multi-dimensional transforms Multithreaded VML (Vector Math Library) Set of vectorized transcendental functions Most of libm functions, but faster VSL (Vector Statistical Library) Set of vectorized random number generators

16 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 16 Performance Libraries: Intel® Math Kernel Library (MKL) Intel® Math Kernel Library Contents BLAS and LAPACK* are both Fortran Legacy of high performance computation VSL and VML have Fortran and C interfaces DFTs have Fortran 95 and C interfaces cblas interface available More convenient for a C/C++ programmer

17 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 17 Performance Libraries: Intel® Math Kernel Library (MKL) Intel® Math Kernel Library Optimizations in LAPACK* Most important LAPACK optimizations: Threading – effectively uses multiple cores Recursive factorization Reduces scalar time (Amdahl’s law: t = t scalar + t parallel /p) Extends blocking further into the code No runtime library support required

18 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 18 Performance Libraries: Intel® Math Kernel Library (MKL) Tuning the SciMark 2.0 LU Kernel Replacing the SciMark 2.0 LU kernel with the LAPACK dgetrf function requires attention to detail: SciMark 2.0 is written in C LAPACK defines a Fortran interface C is call-by-value Fortran is call-by-reference C uses row-major ordering Fortran uses column-major ordering For best performance, dgetrf requires data to be contiguous in memory SciMark 2.0 LU kernel allocates a 2D array as pointers-to-pointers (not necessarily contiguous in memory)

19 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 19 Performance Libraries: Intel® Math Kernel Library (MKL) Tuning the SciMark 2.0 LU Kernel The Intel MKL Lapack significantly improves performance over the original SciMark 2.0 LU source code.

20 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 20 Performance Libraries: Intel® Math Kernel Library (MKL) Intel® Math Kernel Library Contents Discrete Fourier Transforms One dimensional, two-dimensional, three-dimensional… Multithreaded Mixed radix User-specified scaling, transform sign Transforms on embedded matrices Multiple one-dimensional transforms on single call Strides C and F90 interfaces; FFTW interface support

21 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 21 Performance Libraries: Intel® Math Kernel Library (MKL) Using the Intel® Math Kernel Library DFTs Basically a 3-step Process Create a descriptor Status = DftiCreateDescriptor(MDH, …) Commit the descriptor (instantiates it) Status = DftiCommitDescriptor(MDH) Perform the transform Status = DftiComputeForward(MDH, X) Optionally free the descriptor MDH: MyDescriptorHandle

22 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 22 Performance Libraries: Intel® Math Kernel Library (MKL) Tuning the SciMark 2.0 FFT Kernel #include int N = 1024; // Size of SciMark 2.0 small FFT problem double scale = 1.0 / (double)N; double *x = RandomVector ((2 * N), R); // SciMark creates a random vector // of size 2*N to hold real and // imaginary parts DFTI_DESCRIPTOR *dftiHandle; // Structure for MKL DFT descriptor DftiCreateDescriptor (&dftiHandle, // Transform descriptor DFTI_DOUBLE, // Precision DFTI_COMPLEX, // Complex-to-complex 1, // Number of dimensions N); // Size of transform // Apply scaling factor to backward transform DftiSetValue (dftiHandle, DFTI_BACKWARD_SCALE, scale); DftiCommitDescriptor (dftiHandle); DftiComputeForward (dftiHandle, x); // Apply DFT to array x DftiComputeBackward (dftiHandle, x); DftiFreeDescriptor (&dftiHandle);

23 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 23 Performance Libraries: Intel® Math Kernel Library (MKL) Tuning the SciMark 2.0 FFT Kernel The Intel MKL DFT significantly improves performance over the original SciMark 2.0 FFT source code.

24 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 24 Performance Libraries: Intel® Math Kernel Library (MKL) Intel® Math Kernel Library Contents Vector Math Library (VML) Vector Math Library: vectorized transcendental functions – like libm but better (faster) Interface: Have both Fortran and C interfaces Multiple accuracies High accuracy ( < 1 ulp ) Lower accuracy, faster ( < 4 ulps ) Special value handling √(-a), sin(0), and so on Error handling – can not duplicate libm here

25 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 25 Performance Libraries: Intel® Math Kernel Library (MKL) VML: Why Does It Matter? It is important for financial codes (Monte Carlo simulations) Exponentials, logarithms Other scientific codes depend on transcendental functions Error functions can be big time sinks in some codes

26 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 26 Performance Libraries: Intel® Math Kernel Library (MKL) Intel® Math Kernel Library Contents Vector Statistical Library (VSL) Set of random number generators (RNGs) Numerous non-uniform distributions VML used extensively for transformations Parallel computation support – some functions User can supply own BRNG or transformations Five basic RNGs (BRNGs) MCG31, R250, MRG32, MCG59, WH

27 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 27 Performance Libraries: Intel® Math Kernel Library (MKL) Non-Uniform RNGs Gaussian (two methods) Exponential Laplace Weibull Cauchy Rayleigh Lognormal Gumbel

28 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 28 Performance Libraries: Intel® Math Kernel Library (MKL) Using VSL Basically a 3-step Process Create a stream pointer VSLStreamStatePtr stream; Create a stream vslNewStream(&stream,VSL_BRNG_MC_G31,seed ); Generate a set of RNGs vsRngUniform( 0,&stream,size,out,start,end ); Delete a stream (optional) vslDeleteStream(&stream);

29 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 29 Performance Libraries: Intel® Math Kernel Library (MKL) Calculating Pi by Monte Carlo Loop I = 1 to N_samples x.coor = random [0..1] y.coor = random [0..1] dist = sqrt (x^2 + y^2) if dist <= 1 hits = hits + 1 Pi = 4 * hits / N_samples r

30 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 30 Performance Libraries: Intel® Math Kernel Library (MKL) Tuning the SciMark 2.0 MC Kernel #include double MonteCarlo_integrate (int Num_samples) { int i, j, blocks, under_curve = 0; static double rnBuf[2 * BLOCK_SIZE]; double rnX, rnY; VSLStreamStatePtr stream; blocks = Num_samples / BLOCK_SIZE; vslNewStream (&stream, VSL_BRNG_MCG31, SEED); for (i = 0; i < blocks; i++) { vdRngUniform (VSL_METHOD_DUNIFORM_STD, stream, (2 * BLOCK_SIZE), rnBuf, 0.0, 1.0); for (j = 0; j < BLOCK_SIZE; j++) { rnX = rnBuf[2*j]; rnY = rnBuf[2*j+1]; if (sqrt(rnX*rnX + rnY*rnY) <= 1.0) under_curve++; } vslDeleteStream (&stream); return ((double) under_curve / Num_samples) * 4.0; }

31 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 31 Performance Libraries: Intel® Math Kernel Library (MKL) Tuning the SciMark 2.0 MC Kernel The Intel MKL VSL significantly improves performance over the original SciMark 2.0 MC source code.

32 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 32 Performance Libraries: Intel® Math Kernel Library (MKL) Best SciMark 2.0 Single Node Performance Small Problems Small Problems (MFLOPS) GNUIntelSpeedup FFT51018173.6 SOR52410922.1 MC20610034.9 MV8578321.0 LU88418272.1 Comp.59613142.2

33 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 33 Performance Libraries: Intel® Math Kernel Library (MKL) Best SciMark 2.0 Single Node Performance Large Problems Large Problems (MFLOPS) GNUIntelSpeedup FFT4560013.3 SOR49510152.1 MC20610034.9 MV4534571.0 LU392664616.9 Comp.31819446.1 6646

34 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 34 Performance Libraries: Intel® Math Kernel Library (MKL) Intel® Cluster MKL Intel Cluster MKL is a superset of MKL for solving large linear algebra problems on a cluster Intel Cluster MKL contains: ScaLAPACK (Scalable LAPACK) BLACS (Basic Linear Algebra Communication Subprograms) Supports MPICH and the Intel MPI Library

35 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 35 Performance Libraries: Intel® Math Kernel Library (MKL) Data Layout Critical to Parallel Performance ScaLAPACK uses 2D block-cyclic data distribution Example layouts of lower triangular matrix for four processes

36 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 36 Performance Libraries: Intel® Math Kernel Library (MKL) Parallelizing the SciMark 2.0 LU Kernel with Intel® Cluster MKL 1.Initialize the process grid 2.Create a descriptor for each distributed matrix 3.Replace the call to dgetrf with pdgetrf (the ‘p’ is for parallel) Result: LU factorization of a 40000 x 40000 matrix on an 8- node, dual 3.0 GHz Xeon cluster achieves 46000 MFLOPS.

37 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 37 Performance Libraries: Intel® Math Kernel Library (MKL) Performance Libraries: Intel® MKL What’s Been Covered Intel® Math Kernel Library is a broad scientific/engineering math library It is optimized for Intel® processors It is threaded for effective use on multi-core and SMP machines The Intel C++ Compiler for Linux improves SciMark 2.0 performance without requiring code modifications With minor code modifications, Intel MKL dramatically improves the FFT, MC, and LU kernels Some SciMark 2.0 kernels benefit from parallel computing

38 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 38 Performance Libraries: Intel® Math Kernel Library (MKL) Useful Links Intel Software Products http://www.intel.com/software/products/ Intel Software Network http://www.intel.com/software/ Intel Software College http://www.intel.com/software/college/ SciMark 2.0 http://math.nist.gov/scimark2/index.html

39 Copyright © 2007, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 39 Performance Libraries: Intel® Math Kernel Library (MKL)


Download ppt "Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007."

Similar presentations


Ads by Google