N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 1 Porting from the Cray T3E to the IBM SP Jonathan Carter NERSC User Services.

Slides:



Advertisements
Similar presentations
MPI 2.2 William Gropp. 2 Scope of MPI 2.2 Small changes to the standard. A small change is defined as one that does not break existing correct MPI 2.0.
Advertisements

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER TotalView on the T3E and IBM SP Systems NERSC User Services June 12, 2000.
MPI Message Passing Interface
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 1 Mixed Language Programming on Seaborg Mark Durst NERSC User Services.
A Coherent and Managed Runtime for ML on the SCC KC SivaramakrishnanLukasz Ziarek Suresh Jagannathan Purdue University SUNY Buffalo Purdue University.
879 CISC Parallel Computation High Performance Fortran (HPF) Ibrahim Halil Saruhan Although the [Fortran] group broke new ground …
The Jacquard Programming Environment Mike Stewart NUG User Training, 10/3/05.
NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
NUG Meeting 1 File and Data Conversion Jonathan Carter NERSC User Services
COSC 120 Computer Programming
Inline Assembly Section 1: Recitation 7. In the early days of computing, most programs were written in assembly code. –Unmanageable because No type checking,
Application of Fortran 90 to ocean model codes Mark Hadfield National Institute of Water and Atmospheric Research New Zealand.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
Portability Issues. The MPI standard was defined in May of This standardization effort was a response to the many incompatible versions of parallel.
Chapter 8: I/O Streams and Data Files. In this chapter, you will learn about: – I/O file stream objects and functions – Reading and writing character-based.
Guide To UNIX Using Linux Third Edition
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 1 Comparison of Communication and I/O of the Cray T3E and IBM SP Jonathan Carter NERSC User.
Parallel Programming Using Basic MPI Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Presented by Timothy H. Kaiser, Ph.D. San Diego.
Java Security. Topics Intro to the Java Sandbox Language Level Security Run Time Security Evolution of Security Sandbox Models The Security Manager.
Introduction to Java Appendix A. Appendix A: Introduction to Java2 Chapter Objectives To understand the essentials of object-oriented programming in Java.
High Performance Computation --- A Practical Introduction Chunlin Tian NAOC Beijing 2011.
MPI3 Hybrid Proposal Description
Introduction to FORTRAN
CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.
Donald Stark National Center for Atmospheric Research (NCAR) The Developmental Testbed Center (DTC) Wednesday 29 June, 2011 GSI Fundamentals (1): Setup.
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 1 NERSC Software Roadmap David Skinner, NERSC Division, Berkeley Lab.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
ECE 264 Object-Oriented Software Development Instructor: Dr. Honggang Wang Fall 2012 Lecture 3: Requirements Specification, C++ Basics.
Support for Debugging Automatically Parallelized Programs Robert Hood Gabriele Jost CSC/MRJ Technology Solutions NASA.
1 Using HPS Switch on Bassi Jonathan Carter User Services Group Lead NERSC User Group Meeting June 12, 2006.
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Evolution of the NERSC SP System NERSC User Services Original Plans Phase 1 Phase 2 Programming.
Introduction to Arrays in Java Corresponds with Chapter 6 of textbook.
Chapter 7 File I/O 1. File, Record & Field 2 The file is just a chunk of disk space set aside for data and given a name. The computer has no idea what.
Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.
Programming Fundamentals. Today’s Lecture Why do we need Object Oriented Language C++ and C Basics of a typical C++ Environment Basic Program Construction.
Case Study in Computational Science & Engineering - Lecture 2 1 Parallel Architecture Models Shared Memory –Dual/Quad Pentium, Cray T90, IBM Power3 Node.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
1 Serial Run-time Error Detection and the Fortran Standard Glenn Luecke Professor of Mathematics, and Director, High Performance Computing Group Iowa State.
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 1 I/O Strategies for the T3E Jonathan Carter NERSC User Services.
XLF 9.1 Ian J. Bush Computational Science and Engineering Department CCLRC Daresbury Laboratory Warrington WA4 4AD
Software Overview Environment, libraries, debuggers, programming tools and applications Jonathan Carter NUG Training 3 Oct 2005.
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER INTRODUCTION TO THE T3E SYSTEM1 Introduction to the T3E Mark Durst NERSC/USG ERSUG Training,
Portable Parallel Performance Tools Shirley Browne, UTK Clay Breshears, CEWES MSRC Jan 27-28, 1998.
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Comparing Cray Tasking and OpenMP NERSC User Services Overview of Cray Tasking Overview of OpenMP.
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER March 17, Libraries and Their Performance Frank V. Hale Thomas M. DeBoni NERSC User Services.
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 1 Scientific Visualization Wes Bethel Visualization Group NUG Business Meeting May 29, 2003.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
Programming Fundamentals. Overview of Previous Lecture Phases of C++ Environment Program statement Vs Preprocessor directive Whitespaces Comments.
Sending large message counts (The MPI_Count issue)
1 Using PMPI routines l PMPI allows selective replacement of MPI routines at link time (no need to recompile) l Some libraries already make use of PMPI.
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 1 Scaling Up User Codes on the SP David Skinner, NERSC Division, Berkeley Lab.
Announcements Assignment 1 due Wednesday at 11:59PM Quiz 1 on Thursday 1.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.
CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.
LECTURE 3 Translation. PROCESS MEMORY There are four general areas of memory in a process. The text area contains the instructions for the application.
MPI-Message Passing Interface. What is MPI?  MPI is a specification for the developers and users of message passing libraries. By itself, it is NOT a.
Computational Methods of Scientific Programming Lecturers Thomas A Herring, Room A, Chris Hill, Room ,
Lecture 3 Translation.
Introduction to Operating Systems
Objectives You should be able to describe: Interactive Keyboard Input
Parallel Programming By J. H. Wang May 2, 2017.
Objectives Identify the built-in data types in C++
MPI Message Passing Interface
Introduction to Operating Systems
MPI-Message Passing Interface
Shuxia Zhang, Amidu Oloso, Birali Runesha
Introduction to parallelism and the Message Passing Interface
LAPACK3E – A Fortran 90-enhanced version of LAPACK
Presentation transcript:

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 1 Porting from the Cray T3E to the IBM SP Jonathan Carter NERSC User Services

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 2 Overview Focus is on Fortran programs using MPI for communication Outline common pitfalls: –f90 vs. xlf Fortran compiler –Cray vs. IBM MPI library –Math libraries –System libraries –I/O

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 3 f90 vs. xlf - Main Differences f90 –compiles for parallel (MPI) automatically –accepts file suffix.f90,.F90 –default optimization is -O2 –allows access to full memory on a PE by default xlf –compiler is accessed by several names, each name “packages” options together –by default, only file suffix.f and.F allowed –default is no optimization –restricted amount of memory available by default

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 4 xlf Compiler Options Compiler name can have three parts: –optional prefix “mp” indicates MPI library is automatically linked –compiler name, xlf, xlf90, or xlf95 indicates language mode –optional postfix “_r” indicates threads, or OpenMP capability Example: –mpxlf90 - Fortran 90 language compiler with MPI library available –mpxlf_r - Fortran 77 language compiler with MPI library, threads, and OpenMP capability available. If you want to use MPI I/O, the thread capable compiler must be used.

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 5 xlf Compiler Options To use different file suffixes, e.g..f90 and.F90: –-qsuffix=f=f90,F=F90 For optimization we recommend: –-O3 -qtune=pwr3 -qarch=pwr3 -qstrict xlf defaults to 32 Kbytes for stack space and 128 Mbyte for heap space. To increase to maximums of 256 Mbyte for stack, and 2 Gbyte for heap: –-bmaxstack:0x bmaxstack:0x

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 6 Default Datatypes Double Complex is a language extension Assume -dp flag for f90 xlf compiler has -qrealsize=8 to promote all default reals and real constants to 8 bytes. Also, -qintsize=8 to promote all integers and logicals.

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 7 Available Datatypes Fortran 77 “*” syntax is also available to explicitly define a datatype

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 8 MPI Differences Different default datatypes between T3E and SP More error checking of arguments on the SP Default amount of buffering is different Different subset of MPI I/O implemented

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 9 Available MPI Datatypes

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 10 Default MPI Datatypes

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 11 MPI - Argument Checking T3E MPI library has several collective routines which do not check arguments in accordance with the MPI standard. The SP does check arguments. Examples: –MPI_Bcast “count” argument is not checked for consistency on T3E –MPI_Gatherv array of “counts” is not checked for consistency on T3E

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 12 MPI - Buffering If your program depends on the buffering of standard MPI Sends and Receives, you may see different behavior between the T3E and the SP. Classic case:... if (mype.eq.0) then call mpi_send(buf,count,type,1,tag,MPI_COMM_WORLD,ierr) call mpi_recv(buf,count,type,0,tag,MPI_COMM_WORLD,status,ierr) else if (mype.eq.1) then call mpi_send(buf,count,type,0,tag,MPI_COMM_WORLD,ierr) call mpi_recv(buf,count,type,1,tag,MPI_COMM_WORLD,status,ierr) end if...

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 13 MPI - Buffering On the T3E, a message up to 4 Kbyte are buffered. This can be changed by setting the environment variable MPI_BUFFER_MAX. On the SP, the default size depends on the number of processors: 1 to to to to to and over128 This can be changed by setting the environment variable MP_EAGER_LIMIT.

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 14 Cray SciLib and IBM ESSL Both vendors provide libraries of commonly used Linear Algebra subroutines On the T3E this is linked by default, on the SP use “-lessl” These libraries are faster then the public domain BLAS, LAPACK, etc.

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 15 Using BLAS BLAS levels 1 through 3 are completely compatible between the two machines Note which precision of BLAS is being called: –On the T3E real*8 a(n), b(n), x … x = sdot(n,a,1,b,1) –On the SP real*8 a(n), b(n), x … x = ddot(n,a,1,b,1)

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 16 Using BLAS Instead of changing program source, loader options can be used to map one routine to another To resolve a call to sdot by a call to ddot on the SP: xlf -o a.out -brename:sdot,ddot b.f To resolve a call to ddot by a call to sdot on the T3E: f90 -o a.out -Wl”-Dequiv(DDOT)=SDOT” b.f

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 17 LAPACK routines Most other linear algebra routines in Cray SciLib and IBM ESSL are compatible with LAPACK. In ESSL there are a few incompatibilities (x may be C, D, S, Z): xGEEV xSPEV xSPSV xHPEV xHPSV xGEGV xSYGV Use installed LAPACK library for these.

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 18 ScaLAPACK library Cray SciLib and IBM PESSL support pieces of the standard ScaLAPACK library. Check precision of routines: –For real*8 on the T3E, routines start “PS” –For real*8 on the SP, routines start “PD” On the SP, you must call BLACS_GET followed by either BLACS_GRIDINIT or BLACS_GRIDMAP. On the T3E, only a call to one of the latter two routines is required. Public domain ScaLAPACk is also installed on both machines.

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 19 System Libraries Generally, any routines which interact with the operating system, and provide extensions to the Fortran language. Cray provides very many such routines. Some are available on the SP, for example:

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 20 System Libraries A more comprehensive list is available at: Some routines have changed names and slightly different arguments. There are sometimes identically or similarly named routines on the SP which are designed to be called from C only. Calling them from Fortran will cause unexpected behavior. For example, calling exit instead of exit_ will cause the program to end without flushing any Fortran I/O buffer.

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 21 Fortran I/O Unformatted I/O –The primitive datatypes on the T3E and SP are compatible (provided they are of the same length), but control words inserted by Fortran language i/o layer prevent transferability of sequential access files. –Direct access files can be freely transferred between the two machines, as can MPI I/O files. Namelist Input/Output –Users familar with the assign -f77 on the T3E, which causes an old- style namelist input to be written or read, can set the following environment variable on the SP to obtain the same effect: setenv XLFRTEOPTS="namelist=old"

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 22 Further Information T3E and SP webpages and software webpages contain further information and links to vendor documentation: