Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jacobi solver status Lucian Anton, Saif Mulla, Stef Salvini CCP_ASEARCH meeting October 8, 2013 Daresbury 1.

Similar presentations


Presentation on theme: "Jacobi solver status Lucian Anton, Saif Mulla, Stef Salvini CCP_ASEARCH meeting October 8, 2013 Daresbury 1."— Presentation transcript:

1 Jacobi solver status Lucian Anton, Saif Mulla, Stef Salvini CCP_ASEARCH meeting October 8, 2013 Daresbury 1

2 Outline Code structure –Front end –Numerical kernels –Data collection Performance data –Intel SB –Xeon Phi –BlueGeneQ –GPU 8/10/13 Jacobi test program 2

3 Code structure 8/10/13 Jacobi test program 3 Read input from command line –Grid sizes, length of iteration block, # of iteration blocks,.. –Algorithm to use –Output format (header, test iterations, …) Initialize grid with an eigenvalue of Jacobi smoother Run several iteration blocks Collect min, max, average times.

4 Build model 8/10/13 Jacobi test program 4 Uses a generic Makefile + plaform/*.inc files F90 := source /opt/intel/composerxe/bin/compilervars.sh intel64 && \ source /opt/intel/impi/4.1.0/intel64/bin/mpivars.sh && mpiifort CC := source /opt/intel/composerxe/bin/compilervars.sh intel64 && \ source /opt/intel/impi/4.1.0/intel64/bin/mpivars.sh && icc LANG = C ifdef USE_MIC FMIC = -mmic endif ifdef USE_MPI FMPI=-DUSE_MPI endif ifdef USE_DOUBLE_PRECISION DOUBLE=-DUSE_DOUBLE_PRECISION endif ifdef USE_VEC1D VEC1D = -DUSE_VEC1D endif #FC = module add intel/comp intel/mpi && mpiifort

5 Command line parameters 8/10/13 Jacobi test program 5 arcmport01:~/Projects/HOMB>./homb_c_gcc_debug_gpu.exe -help Usage: [-ng ] [ -nb ] [-np ] [-niter ] [-biter ] [-malign ] [- v] [-t] [-pc] [-model [num-waves] [threads-per-column]] [-nh] [-help] arcmport01:~/Projects/HOMB>./homb_c_gcc_debug_gpu.exe -model help possible values for model parameter: baseline baseline-opt blocked wave num-waves threads-per-column basegpu optgpu Note for wave model: if threads-per-column == 0 diagonal wave kernel is used.

6 README file 8/10/13 Jacobi test program 6 Full explanation on command line options are provided in README The following flags can be used to set the grid sized and other run parameters: -ng set the global gris sizes -nb set the computational block size, relevant only for blocked model. Notes: 1) no sanity checks tests are done, you are on your own. 2) for blocked model the OpeNMP parallelism is done over computational blocks. One must ensure that there enough work for all threads by setting suitable block sizes.

7 Correctness check 8/10/13 Jacobi test program 7 -t flag checks if norm ratio are close to Jacobi smoother eigenvalue arcmport01:~/Projects/HOMB>./homb_c_gcc_debug_gpu.exe -t -niter 7 Correctness check iteration, norm ratio, deviation from eigenvalue 0 6.36918e+01 6.26966e+01 1 9.95185e-01 2.55054e-08 2 9.95185e-01 1.50473e-08 3 9.95185e-01 2.57243e-08 4 9.95185e-01 3.27436e-08 5 9.95185e-01 1.96427e-08 6 9.95185e-01 3.17978e-08 # Last norm 6.187368259733268e+01 #========================================================================================================= =# #NThsNxNyNzNITERminTime meanTime maxTime #========================================================================================================= =# 833333311.299e-041.487e-041.690e-04

8 Algorithms 8/10/13 Jacobi test program 8 Basic 3 loops iteration over the grid –OpenMP parallelism applied to external loop –If condition from inner loop eliminated Blocked iterations Wave iterations

9 Algorithms: wave details 8/10/13 Jacobi test program 9 Z Y NewOld New

10 Algorithms: helping vectorisation 8/10/13 Jacobi test program 10 The inner loop can be replace with an easier to vectorize function: // 1D loop that helps the compiler to vectorize static void vec_oneD_loop(const int n, const Real uNorth[], const Real uSouth[], const Real uWest[], const Real uEast[], const Real uBottom[], const Real uTop[], Real w[] ){ int i; #ifdef __INTEL_COMPILER #pragma ivdep #endif #ifdef __IBMC__ #pragma ibm independent_loop #endif for (i=0; i < n; ++i) w[i] = sixth * (uNorth[i] + uSouth[i] + uWest[i] + uEast[i] + uBottom[i] + uTop[i]); }

11 Algorithms: CUDA 8/10/13 Jacobi test program 11 Base laplace3D (from Mike’s lecture notes) Shared memory in XY plane … more to come

12 Data collection 8/10/13 Jacobi test program 12 With such a large parameter space we have a big-ish data problem. Bash script + gnuplot i ndex=0 for exe in $exe_list do for model in $model_list do for nth in $threads_list do export OMP_NUM_THREADS=$nth for ((linsize=10; linsize <= max_linsize; linsize += step)) do biter=$(((10*max_linsize)/linsize)) niter=5 if [ "$model" = wave ] then nwave="$biter $((nth<biter?nth:biter))" echo "model $model $nwave" else nwave="" fi if [ "$blk_x" -eq 0 ] ; then blk_xt=$linsize ; else blk_xt=$blk_x ; fi if [ "$blk_y" -eq 0 ] ; then blk_yt=$linsize ; else blk_yt=$blk_y ; fi if [ "$blk_z" -eq 0 ] ; then blk_zt=$linsize ; else blk_zt=$blk_z ; fi echo "./"$exe" -ng $linsize $linsize $linsize -nb $blk_xt $blk_yt $blk_zt -model $model $nwave

13 SandyBrige baseline 8/10/13 Jacobi test program 13

14 SB: blocked and wave 8/10/13 Jacobi test program 14

15 BGQ 8/10/13 Jacobi test program 15

16 Xeon Phi vs SandyBridge 8/10/13 Jacobi test program 16

17 Fermi data 8/10/13 Jacobi test program 17

18 Conclusions & To do 8/10/13 Jacobi test program 18 We have an integrate set of Jacobi smoother algorithms –OpenMP, CUDA, MPI(almost) –Flexible build system –Run parameters can be selected from command line and preprocessor flags –Correctness check –Scripted data collection –README file Tested on several system (Idataplex, BGQ, Emerald,…, MacOs laptop) GPU needs further improvements ….


Download ppt "Jacobi solver status Lucian Anton, Saif Mulla, Stef Salvini CCP_ASEARCH meeting October 8, 2013 Daresbury 1."

Similar presentations


Ads by Google