Presentation is loading. Please wait.

Presentation is loading. Please wait.

 Copyright, HiPERiSM Consulting, LLC, George Delic, Ph.D. HiPERiSM Consulting, LLC (919)484-9803 P.O. Box 569, Chapel Hill, NC.

Similar presentations


Presentation on theme: " Copyright, HiPERiSM Consulting, LLC, George Delic, Ph.D. HiPERiSM Consulting, LLC (919)484-9803 P.O. Box 569, Chapel Hill, NC."— Presentation transcript:

1  Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com George Delic, Ph.D. HiPERiSM Consulting, LLC (919)484-9803 P.O. Box 569, Chapel Hill, NC 27514 george@hiperism.com http://www.hiperism.com HiPERiSM Consulting, LLC.

2  Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com CHOOSING A COMPILER FOR AQM APPLICATIONS ON LINUX George Delic, Ph.D. Models-3 User’s Workshop October 27-29, 2003 RTP, NC

3  Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com Overview 1.Introduction 2.Choice of Hardware 3.Choice of Compilers 4.Choice of Benchmarks 5.Comparing Execution Times 6.Evaluation of SSE Results 7.Tests for AQM’s 8.Conclusions

4  Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com Introduction  Motivation  AQM’s are migrating to COTS hardware  Linux is preferred  Rich choice of compilers is now available  Need to learn about portability issues  What is known about compilers for IA-32?  CMAQ releases switch compilers w/o comment  Where is the analysis of differences in Performance? Numerical accuracy & stability? Portability problems?

5  Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com Choice of Hardware & Compilers  Hardware  Intel Pentium III (933 MHz, dual processor) with SSE extensions and 256MB L2 cache  Linux 2.4.20 kernel  Fortran compilers for IA-32  Absoft 8.0  Intel 7.1  Lahey 5.6  Portland CDK 4.0

6  Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com Choice of Benchmarks  Kallman Integer and Logical Algorithm  Uses only I & L operations with bit intrinsics  Negligible I/O and memory operations  Six cases with problem size scaling  Stommel Ocean Model sp Floating Point Algorithm  Jacobi iteration sweep over 2-D physical domain  Regular loops optimal for testing vectorization  Six cases in the range N=2x10 3 to 7x10 3 with N 2 =4 to 49 million data points

7  Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com Choice of Benchmarks (cont.)  Princeton Ocean Model dp FP Algorithm  Example of “real-world” code that is numerically unstable with sp arithmetic!  500+ vectorizable loops to exercise compilers  9 procedures account for 85% of CPU time  2-Day simulation for two cases:  Small problem: 65 x 49 x 21  Large problem: 100 x 40 x 15

8  Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com Comparing Execution Times: Kallman compiler switches Compiler and version Compiler command and selected switches Absoft 8.0f90 –O3 –ffixed Intel 7.1ifc –O3 –tpp6 -FI Lahey 5.6lf95 –tpp –fix Portland 4.0pgf90 –fast

9  Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com Comparing Execution Times: Kallman (seconds) NAbsoftIntelLaheyPortland 300.210.360.480.60 4440.3880.1998.45135.29 486.4413.1516.1622.52 5223.0348.2059.3083.28 56197.78412.83509.31712.42 6012891.5826734.0932833.0845451.38

10  Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com Comparing Execution Times: Kallman (log10 seconds)

11  Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com Comparing Execution Times: Kallman (ratio to Absoft time)

12  Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com Comparing Execution Times: SOM (POM) compiler switches (without SSE) Compiler and version Compiler command and selected switches Absoft 8.0f90 –s –cpu:p6–O3 (-N113) – ffixed Intel 7.1ifc –O3 (-r8) –tpp6 -FI Lahey 5.6lf95 –tpp (-dbl) –fix Portland 4.0pgf90 –fast (-r8) –Mvect

13  Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com Comparing Execution Times: SOM without SSE (seconds) NAbsoftIntelLaheyPortland 200050.038.836.441.4 3000110.594.487.792.7 4000197.7159.6150.3163.3 5000305.3224.3246.8253.1 6000443.4320.0332.0388.5 7000586.5427.6477.9524.4

14  Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com Comparing Execution Times: SOM (without SSE)

15  Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com Statistics for four compilers: SOM (without SSE)

16  Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com Comparing Execution Times: POM (without SSE) CaseAbsoftIntelLaheyPortland 1909.1826.4728.8836.3 2825.1786.9671.2755.3

17  Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com Statistics for four compilers: Variability vs. problem size

18  Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com Evaluation of SSE Results  IA-32 Hardware  Intel Pentium III+ supports Streaming- Single-Instruction-Multiple-Data Extensions (SSE)  Linux 2.4.20 kernel supports SSE  Fortran compilers that enable SSE  Intel 7.1  Portland CDK 4.0

19  Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com Comparing Execution Times: SOM (POM) compiler switches (with SSE) Compiler and version Compiler command and selected switches Intel 7.1ifc –O3 -xK (-r8) –tpp6 -FI Portland 4.0pgf90 –fast (-r8) –Mvect=sse

20  Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com Comparing Execution Times: SOM (with SSE)

21  Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com Comparing Execution Times: POM (with SSE)

22  Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com Evaluation of SSE Results  Fortran compilers with SOM (sp)  Intel 7.1 Average speed up of 1.44  Portland CDK 4.0 Average speed up of 1.70  Fortran compilers with POM (dp)  Intel 7.1 Average speed up of 1.25  Portland CDK 4.0 Average speed up of 1.19

23  Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com Tests for AQM’s Next steps for CMAQ with four compilers: Report on portability issues Re-compilation of all libraries Performance instrumentation & analysis Numerical & stability analysis OpenMP performance study Please propose scenarios worthwhile using for these tests!

24  Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com Conclusions  Hardware: COTS is the way to go but …….  Linux: Operating System is popular but …..  Programming Environment: rich in choices  Consequences for AQM: the combination of hardware, Linux, and programming environment needs careful on-going evaluation. HiPERiSM is ready for this task!

25  Copyright, HiPERiSM Consulting, LLC, http://www.hiperism.com HiPERiSM’s URL http://www.hiperism.com Talk to us about your requirements


Download ppt " Copyright, HiPERiSM Consulting, LLC, George Delic, Ph.D. HiPERiSM Consulting, LLC (919)484-9803 P.O. Box 569, Chapel Hill, NC."

Similar presentations


Ads by Google