# The Centre for Australian Weather and Climate Research A partnership between CSIRO and the Bureau of Meteorology Basic Intel software requirements for.

## Presentation on theme: "The Centre for Australian Weather and Climate Research A partnership between CSIRO and the Bureau of Meteorology Basic Intel software requirements for."— Presentation transcript:

The Centre for Australian Weather and Climate Research A partnership between CSIRO and the Bureau of Meteorology Basic Intel software requirements for Bureau applications specialists for development and optimisation on SUN HPC platform Ilia Bermous Senior ITO, CAWCR 22 January 2010

Stages in Optimisation Process  Compilation and building an executable  Execution  Optimisation or code restructuring for better performance

Compilation and Building Executable (1)  Are any static analysis tools similar in flavour to Cray cflint available?  User should be able easily to identify what and how  Optimised  Vectorised  Parallelised by the compiler  an example can be transformation and formatted listing provided with NEC SX compilers and cross-compilers

Example subroutine vec_par(a,b,c,n,m) real, dimension (n,n,n) :: a,b,c !cdir concur do i=1,n do k=1,m do j=1,n a(k,j,i)=b(k,j,i) + c(k,j,i) end do end parallelisation directive

Transformation Listing. do i = 1, n. if (n.gt. 0) then. J1 = and(n,3). do j = 1, J1. !CDIR NODEP. do k = 1, m. a(k,j,i) = b(k,j,i) + c(k,j,i). end do. do j = 1, n/4. !CDIR NODEP. do k = 1, m. a(k,(j-1)*4+J1+1,i) = b(k,(j-1)*4+J1+1,i) + c(k,(j-1)*. 1 4+J1+1,i). a(k,(j-1)*4+2+J1,i) = b(k,(j-1)*4+2+J1,i) + c(k,(j-1)*. 1 4+2+J1,i). a(k,(j-1)*4+3+J1,i) = b(k,(j-1)*4+3+J1,i) + c(k,(j-1)*. 1 4+3+J1,i). a(k,j*4+J1,i) = b(k,j*4+J1,i) + c(k,j*4+J1,i). end do. endif. end do loop interchange loop unrolling

Formatted Listing... 4: P------> do i=1,n 5: |X-----> do k=1,m 6: ||+----> do j=1,n 7: ||| a(k,j,i)=b(k,j,i) + c(k,j,i) 8: ||+---- end do 9: |X----- end do 10: P------ end do... loops 5-9 are interchanged and vectorised loop 4-10 is parallelised

Compilation and Building Executable (2)  Requirement for more robust Fortran/C compilers we have been affected by compiler bugs: when we started to use Fortran compiler for our applications immediately an optimisation bug was detected, also a number of other compiler problems have been reported to Intel  Current Intel compiler versions still have too many bugs  According to a recent report (*), 169 bugs were fixed in the latest 11th version of Fortran compiler. From my point of view, some of them are very dangerous. (*) http://software.intel.com/en-us/articles/intel-professional-edition-compilers-111-fixes-list/ http://software.intel.com/en-us/articles/intel-professional-edition-compilers-111-fixes-list/  Release notes for each compiler revision should include a section stating what has actually been implemented for this particular revision

Execution  At the end of execution important performance characteristics should be readily available to the user to be able to identify whether the application has run efficiently or not On NEC SX for any Fortran application the following characteristics are printed out with an environment term setting  MFLOPS  Vector Operation Ratio & Average Vector Length  Instruction/Operand Cache miss time  Bank Conflict time without any impact on the application performance  NEC ftrace tool provides similar performance characteristics for any program unit compiled with a special “-ftrace” option  with code directives, this info can be obtained for any code sections, starting and ending anywhere in any program unit

Program Information Output Global Data of 3 processes : Min [U,R] Max [U,R] Average ========================== Real Time (sec) : 544.678 [0,1] 554.766 [0,2] 549.728 User Time (sec) : 3383.378 [0,0] 3598.739 [0,2] 3479.353 System Time (sec) : 14.129 [0,1] 14.478 [0,2] 14.305 Vector Time (sec) : 334.675 [0,1] 346.198 [0,0] 340.617 Instruction Count : 38739990085 [0,1] 40868170971 [0,0] 40022002145 Vector Instruction Count : 7456076063 [0,1] 7942498968 [0,2] 7725412162 Vector Element Count : 997371328007 [0,1] 1069853462653 [0,2] 1031688872790 FLOP Count : 337475560162 [0,0] 342575608843 [0,2] 339235679815 MOPS : 297.648 [0,1] 313.572 [0,0] 305.847 MFLOPS : 95.193 [0,2] 99.745 [0,0] 97.547 Average Vector Length : 132.153 [0,0] 134.700 [0,2] 133.540 Vector Operation Ratio (%) : 96.881 [0,0] 97.050 [0,2] 96.963 Memory size used (MB) : 13040.000 [0,0] 13056.000 [0,1] 13050.667 MIPS : 11.210 [0,1] 12.079 [0,0] 11.510 Instruction Cache miss (sec): 23.864 [0,1] 24.546 [0,2] 24.153 Operand Cache miss (sec): 25.692 [0,0] 26.588 [0,2] 26.193 Bank Conflict Time (sec): 8.762 [0,0] 11.791 [0,2] 9.997 Max. Concurrent Processes : 8 [0,0] 8 [0,0] 8 MOPS (concurrent) : 2090.405 [0,0] 2141.803 [0,2] 2114.935 MFLOPS (concurrent) : 664.944 [0,0] 693.459 [0,1] 674.666 MIPS (concurrent) : 78.606 [0,2] 80.524 [0,0] 79.564 Event Busy Count : 0 [0,0] 0 [0,0] 0 Event Wait (sec) : 0.000 [0,0] 0.000 [0,0] 0.000 Lock Busy Count : 35636 [0,2] 39030 [0,0] 36770 Lock Wait (sec) : 2.106 [0,0] 2.487 [0,1] 2.331 Barrier Busy Count : 0 [0,0] 0 [0,0] 0 Barrier Wait (sec) : 0.000 [0,0] 0.000 [0,0] 0.000

I/O Information ****** File Information ****** Unit No. : 20 File Name : BX2005092518 Named : YES Current Directory : /bm/flush3/iliab/gasp/test/2005092612 I/O Exec. Count : READ WRITE OPEN CLOSE INQUIRE 178 0 1 0 0 FIND DEFINE FILE 0 0 Format : UNFORMATTED Blank : ---- Access : DIRECT Recl (Byte) : 45056 Max Record No. : 3911 File Size (Byte) : 179818496 File Descriptor : 3 File System Type : NFS Open Mode : READWRITE Terminal Assignment : NO I/O Buffer Size (KByte,F_SETBUF) : 1024 Total(In/Out) Input Output Total Data Size (Byte) : 8010552, 8010552, 0 Max Data Size (Byte) : 45056, 0 Min Data Size (Byte) : 35640, 0 Ave Data Size (Byte) : 45003, 45003, 0 Transfer Rate (KByte/sec) : 5746.793, 5746.793, 0.000 Total(In/Out/Aux) Input Output RTP-call Count : 535, 534, 0 System-call Count (read/write) Exec. Count : 68, 0 Ave Data Size (Byte) : 1048576, 0 Real Time (sec) : 1.367772, 1.361247, 0.000000 User Time (sec) : 0.007263, 0.007138, 0.000000 F_INPUT Option : NO F_OUTPUT Option : NO F_NORCW Option : NO F_PARTRCW Option : NO F_EXPRCW Option : NO F_UFMTFLOAT1 Option : NO F_UFMTFLOAT2 Option : NO F_UFMTIEEE Option : NO F_UFMTENDIAN Option : NO F_UFMTADJUST Option : NO F_HSDIR Option : NO F_VSPACING Option : NO F_PROMOTE Option : NO

Optimisation (1)  Need to know  What are the primary performance characteristics for an application performance improvement?  How should these primary performance characteristics be measured?  How should these primary performance characteristics be addressed?

Optimisation (2)  Optimisation manuals and documentation:  Manuals need to include description of technique illustrated by simple examples.  Significant improvement is required in the existing manuals, for example “Intel(R) Fortran Compiler Optimizing Applications Document Number: 307781-003US”  The document "Consistency of Floating-Point Results using the Intel Compiler" was very useful for understanding on how to get reproducible results, but it content should be included in the main manuals  Are there Intel websites available where further related information can be found?  Manuals and Release notes should be in one place with good indexing and searching.

Summary  We need to have a user friendly software environment at each stage during performance tuning procedure

Download ppt "The Centre for Australian Weather and Climate Research A partnership between CSIRO and the Bureau of Meteorology Basic Intel software requirements for."

Similar presentations