Presentation on theme: "The Jacquard Programming Environment Mike Stewart NUG User Training, 10/3/05."— Presentation transcript:
The Jacquard Programming Environment Mike Stewart NUG User Training, 10/3/05
2 Outline Compiling and Linking. Optimization. Libraries. Debugging. Porting from Seaborg and other systems.
3 Pathscale Compilers Default compilers: Pathscale Fortran 90, C, and C++. Module “path” is loaded by default and points to the current default version of the Pathscale compilers (currently 2.2.1). Other versions available: module avail path. Extensive vendor documentation available on-line at Commercial product: well supported and optimized.
4 Compiling Code Compiler invocation: –No MPI: pathf90, pathcc, pathCC. –MPI: mpif90, mpicc, mpicxx The mpi compiler invocation will use the currently loaded compiler version. The mpi and non-mpi compiler invocations have the same options and arguments.
5 Compiler Optimization Options 4 numeric levels –On where n ranges from 0 (no optimization) to 3. Default level: -O2 (unlike IBM) –g without a –O option changes the default to –O0.
6 -O1 Optimization Minimal impact on compilation time compared to –O0 compile. Only optimizations applied to straight line code (basic blocks) like instruction scheduling.
7 -O2 Optimization Default when no optimization arguments given. Optimizations that always increase performance. Can significantly increase compilation time. -O2 optimization examples: –Loop nest optimization. –Global optimization within a function scope. –2 passes of instruction scheduling. –Dead code elimination. –Global register allocation.
8 -O3 Optimization More extensive optimizations that may in some cases slow down performance. Optimizes loop nests rather than just inner loops, i.e. inverts indices, etc. “Safe” optimizations – produces answers identical with those produced by –O0. NERSC recommendation based on experiences with benchmarks.
9 -Ofast Optimization Equivalent to -O3 -ipa -fno-math-errno -OPT:roundoff=2:Olimit=0:div_split=ON:alias=typed. ipa – interprocedural analysis. –Optimizes across functional boundaries. –Must be specified both at compile and link time. Aggressive “unsafe” optimizations: –Changes order of evaluation. –Deviates from IEEE 754 standard to obtain better performance. There are some known problems with this level of optimization in the current release,
10 NAS B Serial Benchmarks Performance (MOP/S) Seaborg Best -O0-O1-O2-O3-Ofast BT CG EP FT did not compile IS LU MG SP
11 NAS B Serial Benchmarks Compile Times (seconds) -O0-O1-O2-O3-Ofast BT CG EP FT did not compile IS LU MG SP
12 NAS B Optimization Arguments Used by LNXI Benchmarkers BenchmarkArguments BT-O3 -ipa -WOPT:aggstr=off CG-O3 -ipa -CG:use_movlpd=on -CG:movnti=1 EP-LNO:fission=2 -O3 -LNO:vintr=2 FT-O3 -LNO:opt=0 IS-Ofast -DUSE_BUCKETS LU-Ofast -LNO:fusion=2:prefetch=0:full_unroll=10:ou_max=5 -OPT:ro=3:fold_unsafe_relops=on:fold_unsigned_relops=on: unroll_size=256:unroll_times_max=16:fast_complex -CG:cflow=off:p2align_freq=1 -fno-exceptions MG-O3 -ipa -WOPT:aggstr=off -CG:movnti=0 SP-Ofast
13 NAS C FT (32 Proc) OptimizationMops/ProcCompile Time (seconds) Seaborg Best 86.5N/A -O O O O Ofast
14 SuperLU MPI Benchmark Based on the SuperLU general purpose library for the direct solution of large, sparse, nonsymmetric systems of linear equations. Mostly C with some Fortran 90 routines. Run on 64 processors/32 nodes. Uses BLAS routines from ACML.
15 SLU (64 procs) OptimizationElapsed run time (seconds) Compile Time (seconds) Seaborg Best 742.5N/A -O O O O OfastN/ADid not compile
17 ACML Library AMD Core Math Library - set of numerical routines tuned specifically for AMD64 platform processors. –BLAS –LAPACK –FFT To use with pathscale: –module load acml (built with pathscale compilers) –Compile and link with $ACML To use with gcc: –module load acml_gcc (build with pathscale compilers) –Compile and link with $ACML
18 Matrix Multiply Optimization Example 3 ways to multiply 2 dense matrices –Directly in Fortran with nested loops –Matmul F90 intrinsic –dgemm from ACML Example by 1000 double precision matrices. Order of indices: ijk means – do i=1,n – do j=1,n – do k=1,n