Introduction to Scientific Computing

Introduction to Scientific Computing
Intro to Scientific Computing 10/9/2008 Introduction to Scientific Computing The Title Slide: Add the name of the presentation, the appropriate division or presenter and date of the presentation. Shubin Liu, Ph.D. Research Computing Center University of North Carolina at Chapel Hill Research Computing Center, ITS

Course Goals An introduction to high-performance computing and UNC Research Computing Center Available Research Computing hardware facilities Available software packages Serial/parallel programming tools and libraries How to efficiently make use of Research Computing facilities on campus

Agenda Introduction to High-Performance Computing Hardware Available
Servers, storage, file systems, etc. How to Access Programming Tools Available Compilers & Debugger tools Utility Libraries Parallel Computing Scientific Packages Available Job Management Hands-on Exercises– 2nd hour The PPT format of this presentation is available here: /afs/isis/depts/its/public_html/divisions/rc/training/scientific/short_courses/

Pre-requisites An account on Emerald cluster UNIX Basics
Getting started: Intermediate: vi Editor: Customizing: Shells: ne Editor: Security: Data Management: Scripting: HPC Application:

About Us ITS – Information Technology Services http://its.unc.edu
Physical locations: 401 West Franklin St. 211 Manning Drive 10 Divisions/Departments Information Security IT Infrastructure and Operations Research Computing Center Teaching and Learning User Support and Engagement Office of the CIO Communication Technologies Enterprise Resource Planning Enterprise Applications Finance and Administration

Research Computing Center
Where and who are we and what do we do? ITS Manning: 211 Manning Drive Website Groups Infrastructure -- Hardware User Support -- Software Engagement -- Collaboration

Intro to Scientific Computing
About Myself 9/25/20089/11/2008 10/9/2008 Ph.D. from Chemistry, UNC-CH Currently Senior Computational Research Computing Center, UNC-CH Responsibilities: Support Computational Chemistry/Physics/Material Science software Support Programming (FORTRAN/C/C++) tools, code porting, parallel computing, etc. Offer short training courses for campus users Conduct research and engagement projects in Computational Chemistry Development of DFT theory and concept tools Applications in biological and material science systems Research Computing Center, ITS 7

What is Scientific Computing?
Short Version To use high-performance computing (HPC) facilities to solve real scientific problems. Long Version, from Wikipedia.com Scientific computing (or computational science) is the field of study concerned with constructing mathematical models and numerical solution techniques and using computers to analyze and solve scientific and engineering problems. In practical use, it is typically the application of computer simulation and other forms of computation to problems in various scientific disciplines.

What is Scientific Computing?
Engineering Sciences Theory/Model Layer Algorithm Layer Scientific Computing Applied Mathematics Computer Science Hardware/Software Natural Sciences Application Layer From scientific discipline viewpoint From operational viewpoint High- Performance Computing Scientific Computing Parallel Computing From Computing Perspective

What is HPC? Computing resources which provide more than an order of magnitude more computing power than current top-end workstations or desktops – generic, widely accepted. HPC ingredients: large capability computers (fast CPUs) massive memory enormous (fast & large) data storage highest capacity communication networks (Myrinet, 10 GigE, InfiniBand, etc.) specifically parallelized codes (MPI, OpenMP) visualization

Why HPC? What are the three-dimensional structures of all of the proteins encoded by an organism's genome and how does structure influence function, both spatially and temporally? What patterns of emergent behavior occur in models of very large societies? How do massive stars explode and produce the heaviest elements in the periodic table? What sort of abrupt transitions can occur in Earth’s climate and ecosystem structure? How do these occur and under what circumstances? If we could design catalysts atom-by-atom, could we transform industrial synthesis? What strategies might be developed to optimize management of complex infrastructure systems? What kind of language processing can occur in large assemblages of neurons? Can we enable integrated planning and response to natural and man-made disasters that prevent or minimize the loss of life and property?

Measure of Performance
1 CPU, Units in MFLOPS (x106) Machine/CPU Type LINPACK Performance Peak Performance Intel Pentium 4 (2.53 GHz) 2355 5060 Mega FLOPS (x106) Giga FLOPS (x109) Tera FLOPS (x1012) Peta FLOPS (x1015) Exa FLOPS (x1018) Zetta FLOPS (x1021) Yotta FLOPS (x1024) NEC SX-6/1 (1proc. 2.0 ns) 7575 8000 HP rx5670 Itanium2 (1GHz) 3528 4000 IBM eServer pSeries 690 (1300 MHz) 2894 5200 Cray SV1ex-1-32(500MHz) 1554 2000 Compaq ES45 (1000 MHz) 1542 2000 AMD Athlon MP1800+(1530MHz) 1705 3060 Intel Pentium III (933 MHz) 507 933 SGI Origin 2000 (300 MHz) 533 600 Intel Pentium II Xeon (450 MHz) 295 450 Sun UltraSPARC (167MHz) 237 333 Reference:

How to Quantify Performance? TOP500
A list of the 500 most powerful computer systems over the world Established in June 1993 Compiled twice a year (June & November) Using LINPACK Benchmark code (solving linear algebra equation aX=b ) Organized by world-wide HPC experts, computational scientists, manufacturers, and the Internet community Homepage:

10/9/2008 TOP500:November 2007 TOP 5, Units in GFLOPS (=1024 MGLOPS) Rank Installatio Site /Year Manufacturer Computer/Procs Rmax Rpeak 1 DOE/NNSA/LLNL United States/2007 BlueGene/L eServer Blue Gene Solution / 212,992, IBM 478, ,378 2 Forschungszentrum Juelich (FZJ) Germany/2007 JUGENE - Blue Gene/P Solution IBM 65,536 167,300 222,822 3 SGI/New Mexico Computing Applications Center (NMCAC) United States/2007 SGI Altix ICE 8200, Xeon quad core 3.0 GHz, SGI 14,336 126,900 172,032 4 Computational Research Laboratories, TATA SONS India/2007 EKA - Cluster Platform 3000 BL460c, Xeon 53xx 3GHz, Infiniband Hewlett-Packard, 14,240 ,800 5 Government Agency Sweden/2007 Cluster Platform 3000 BL460c, Xeon 53xx 2.66GHz, Infiniband Hewlett-Packard, 13,728 102,800 146,430 36 University of North Carolina United States/2007 Topsail - PowerEdge 1955, 2.33 GHz, Cisco/Topspin Infiniband, Dell, 4160 28,770 Research Computing Center, ITS

TOP500: June 2008 Rank Site Computer/Year Vendor Cores Rmax Rpeak
Power 1 DOE/NNSA/LANL United States Roadrunner - BladeCenter QS22/LS21 Cluster, PowerXCell 8i 3.2 Ghz / Opteron DC 1.8 GHz , Voltaire Infiniband / 2008 IBM 122,400 2 DOE/NNSA/LLNL United States BlueGene/L - eServer Blue Gene Solution / 2007 IBM 212,992 478.20 596.38 3 Argonne National Laboratory United States Blue Gene/P Solution / 2007 IBM 163,840 450.30 557.06 4 Texas Advanced Computing Center Univ. of Texas United States Ranger - SunBlade x6420, Opteron Quad 2Ghz, Infiniband / 2008 Sun Microsystems 62,976 326.00 503.81 5 DOE/Oak Ridge National Laboratory United States Jaguar - Cray XT4 QuadCore 2.1 GHz / 2008 Cray Inc. 30,976 205.00 260.20 67 University of North Carolina United States/2007 Topsail - PowerEdge 1955, 2.33 GHz, Cisco/Topspin Infiniband, Dell 4,160 28.77 38.82 Rmax and Rpeak values are in TFlops. Power data in KW for entire system

TOP500: June 2009 Rank Site Computer/Year Vendor Cores Rmax Rpeak
Power 1 DOE/NNSA/LANL United States Roadrunner - BladeCenter QS22/LS21 Cluster, PowerXCell 8i 3.2 Ghz / Opteron DC 1.8 GHz, Voltaire Infiniband / 2008 IBM 129600 2 Oak Ridge National Laboratory United States Jaguar - Cray XT5 QC 2.3 GHz / 2008 Cray Inc. 150152 3 Forschungszentrum Juelich (FZJ) Germany JUGENE - Blue Gene/P Solution / 2009 IBM 294912 825.50 4 NASA/Ames Research Center/NAS United States Pleiades - SGI Altix ICE 8200EX, Xeon QC 3.0/2.66 GHz / 2008 SGI 51200 487.01 608.83 5 DOE/NNSA/LLNL United States BlueGene/L - eServer Blue Gene Solution / 2007 IBM 212992 478.20 596.38 175 University of North Carolina United States/2007 Topsail - PowerEdge 1955, 2.33 GHz, Cisco/Topspin Infiniband, Dell 4,160 28.77 38.82

TOP500 History of UNC-CH Entry
List Systems Highest Ranking Sum Rmax (GFlops) Sum Rpeak (GFlops) Site Efficiency (%) 06/2009 1 175 74.11 11/2008 88 06/2008 67 11/2007 36 06/2007 25 11/2006 104 83.49 06/2006 74 11/2003 393 439.30 36.32 06/1999 499 24.77 28.80 86.01

Shared/Distributed-Memory Architecture
CPU CPU CPU CPU CPU CPU CPU CPU M M M M BUS NETWORK MEMORY Distributed memory - each processor has it’s own local memory. Must do message passing to exchange data between processors. (examples: Baobab, the new Dell Cluster) Shared memory - single address space. All processors have access to a pool of shared memory. (examples: Chastity/zephyr, happy/yatta, cedar/cypress, sunny) Methods of memory access : Bus and Crossbar

What is a Beowulf Cluster?
A Beowulf system is a collection of personal computers constructed from commodity-off-the-shelf hardware components interconnected with a system-area-network and configured to operate as a single unit, parallel computing platform (e.g., MPI), using an open-source network operating system such as LINUX. Main components: PCs running LINUX OS Inter-node connection with Ethernet, Gigabit, Myrinet, InfiniBand, etc. MPI (message passing interface)

LINUX Beowulf Clusters

What is Parallel Computing ?
Concurrent use of multiple processors to process data Running the same program on many processors. Running many programs on each processor.

Advantages of Parallelization
Cheaper, in terms of Price/Performance Ratio Faster than equivalently expensive uniprocessor machines Handle bigger problems More scalable: the performance of a particular program may be improved by execution on a large machine More reliable: In theory if processors fail we can simply use others

Catch: Amdahl's Law Speedup = 1/(s+p/n)

Parallel Programming Tools
Share-memory architecture OpenMP Distributed-memory architecture MPI, PVM, etc.

OpenMP An Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism What does OpenMP stand for? Open specifications for Multi Processing via collaborative work between interested parties from the hardware and software industry, government and academia. Comprised of three primary API components: Compiler Directives Runtime Library Routines Environment Variables Portable: The API is specified for C/C++ and Fortran Multiple platforms have been implemented including most Unix platforms and Windows NT Standardized: Jointly defined and endorsed by a group of major computer hardware and software vendors Expected to become an ANSI standard later??? Many compilers can automatically parallelize a code with OpenMP!

OpenMP Example (FORTRAN)
PROGRAM HELLO INTEGER NTHREADS, TID, OMP_GET_NUM_THREADS, + OMP_GET_THREAD_NUM C Fork a team of threads giving them their own copies of variables !$OMP PARALLEL PRIVATE(TID) C Obtain and print thread id TID = OMP_GET_THREAD_NUM() PRINT *, 'Hello World from thread = ', TID C Only master thread does this IF (TID .EQ. 0) THEN NTHREADS = OMP_GET_NUM_THREADS() PRINT *, 'Number of threads = ', NTHREADS END IF C All threads join master thread and disband !$OMP END PARALLEL END

The Message Passing Model
Parallelization scheme for distributed memory. Parallel programs consist of cooperating processes, each with its own memory. Processes send data to one another as messages Message can be passed around among compute processes Messages may have tags that may be used to sort messages. Messages may be received in any order.

MPI: Message Passing Interface
Message-passing model Standard (specification) Many implementations (almost each vendor has one) MPICH and LAM/MPI from public domain most widely used GLOBUS MPI for grid computing Two phases: MPI 1: Traditional message-passing MPI 2: Remote memory, parallel I/O, and dynamic processes Online resources

A Simple MPI Code C Version FORTRAN Version #include "mpi.h"
#include <stdio.h> int main( argc, argv ) int argc; char **argv; { MPI_Init( &argc, &argv ); printf( "Hello world\n" ); MPI_Finalize(); return 0; } include ‘mpif.h’ integer myid, ierr, numprocs call MPI_INIT( ierr) call MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr) call MPI_COMM_SIZE (MPI_COMM_WORLD, numprocs,ierr) write(*,*) ‘Hello from ‘, myid write(*,*) ‘Numprocs is’, numprocs call MPI_FINALIZE(ierr) end C Version FORTRAN Version

Other Parallelization Models
VIA: Virtual Interface Architecture -- Standards-based Cluster Communications PVM: a portable message-passing programming system, designed to link separate host machines to form a ``virtual machine'' which is a single, manageable computing resource. It’s largely an academic effort and there has been no much development since 1990s. BSP: Bulk Synchronous Parallel Model, a generalization of the widely researched PRAM (Parallel Random Access Machine) model Linda:a concurrent programming model from Yale, with the primary concept of ``tuple-space'' HPF: PGI’s first standard parallel programming language for shared and distributed-memory systems.

RC Servers @ UNC-CH SGI Altix 3700 – SMP, 128 CPUs, cedar/cypress
Emerald LINUX Cluster – Distributed memory, ~500 CPUs, emerald yatta/p575 IBM AIX nodes Dell LINUX cluster – Distributed memory 4160 CPUs, topsail

IBM P690/P575 SMP IBM pSeries 690/P575 Model 6C4, Power4+ Turbo, GHz processors - access to 4TB of NetApp NAS RAID array used for scratch space, mounted as /nas and /netscr OS: IBM AIX 5.3 Maintenance Level 04 login node: emerald.isis.unc.edu compute node: yatta.isis.unc.edu 32 CPUs P575-n00.isis.unc.edu 16 CPUs P575-n01.isis.unc.edu 16 CPUs P575-n02.isis.unc.edu 16 CPUs P575-n03.issi.unc.edu 16 CPUs

SGI Altix 3700 SMP 10/9/2008 Servers for Scientific Applications such as Gaussian, Amber, and custom code Login node: cedar.isis.unc.edu Compute node: cypress.isis.unc.edu Cypress: SGI Altix 3700bx Intel Itanium2 Processors (1600MHz), each with 16k L1 cache for data, 16k L1 cache for instructions, 256k L2 cache, 6MB L3 cache, 4GB of Shared Memory (512GB total memory) Two 70 GB SCSI System Disks as /scr Research Computing Center, ITS

SGI Altix 3700 SMP 10/9/2008 Cedar: SGI Altix Intel Itanium2 Processors (1500MHz), each with 16k L1 cache for data, 16k L1 cache for instructions, 256k L2 cache, 4MB L3 cache, 1GB of Shared Memory (8GB total memory), two 70 GB SATA System Disks. RHEL 3 with Propack 3, Service Pack 3 No AFS (HOME & pkg space) access Scratch Disk: /netscr, /nas, /scr Research Computing Center, ITS

Emerald Cluster General purpose Linux Cluster for Scientific and Statistical Applications Machine Name: emerald.isis.unc.edu 2 Login Nodes: IBM BladeCenter, one Xeon 2.4GHz, 2.5GB RAM and one Xeon 2.8GHz, 2.5GB RAM 18 Compute Nodes: Dual AMD Athlon GHz MP Processor, Tyan Thunder MP Motherboard, 2GB DDR RAM on each node 6 Compute Nodes: Dual AMD Athlon GHz MP Processor, Tyan Thunder MP Motherboard, 2GB DDR RAM on each node 25 Compute Nodes: IBM BladeCenter, Dual Intel Xeon 2.4GHz, 2.5GB RAM on each node 96 Compute Nodes: IBM BladeCenter, Dual Intel Xeon 2.8GHz, 2.5GB RAM on each node 15 Compute Nodes: IBM BladeCenter, Dual Intel Xeon 3.2GHz, 4.0GB RAM on each node Access to 10 TB of NetApp NAS RAID array used for scratch space, mounted as /nas and /scr Login: emerald.isis.unc.edu Access to 7TB of NetApp NAS RAID array used for scratch space, mounted as /nas and /scr OS: RedHat Enterprise Linux 3.0 TOP500: 395th place in the June 2003 release.

Dell LINUX Cluster, Topsail
520 dual nodes (4160 CPUs) Xeon (EM64T) 3.6GHz, 2MB L1 cache 2GB memory per CPU InfiniBand inter-node connection Not AFS mounted, not open to general public Access based on peer-reviewed proposal HPL: Teraflops, 74th in 2006 JuneTOP500 list and 104th in the November 2006 list and 25th in the June 2007 list (28.77 teraflops after upgrade)

Topsail Login node : topsail.unc.edu GHz Intel EM64T with 2x4M L2 cache (Model E5345/Clovertown), 12 GB memory Compute nodes : 4, GHz Intel EM64T with 2x4M L2 cache (Model E5345/Clovertown), 12 GB memory Shared Disk : (/ifs1) 39 TB IBRIX Parallel File System Interconnect: Infiniband 4x SDR Resource management is handled by LSF v.7.2, through which all computational jobs are submitted for processing

File Systems AFS (Andrew File System): AFS is a distributed network file system that enables files from any AFS machine across the campus to be accessed as easily as files stored locally. As ISIS HOME for all users with an ONYEN – the Only Name You’ll Ever Need Limited quote: 250 MB for most users [type “fs lq” to view] Current production version openafs Files backed up daily [ ~/OldFiles ] Directory/File tree: /afs/isis/home/o/n/onyen For example: /afs/isis/home/m/a/mason, where “mason” is the ONYEN of the user Accessible from emerald, happy/yatta But not from cedar/cypress, topsail Not sutiable for research computing tasks! Recommended to compile, run I/O intensive jobs on /scr or /netscr More info:

Basic AFS Commands To add or remove packages
ipm add pkg_name, ipm remove pkg_name To find out space quota/usage fs lq To see and review AFS tokens (read/write-able), which expires in 25 hours tokens, klog Over 300 packages installed in AFS pkg space /afs/isis/pkg/ More info available at

Data Storage Local Scratch: /scr – local to a machine
Cedar/cypress: 2x500 GB SCSI System Disks Topsail: /ifs1/scr TB IBRIX Parallel File System Happy/yatta: 2x500 GB Disk Drives For running jobs, temporary data storage, not backed up Network Attached Storage (NAS) – for temporary storage /nas/uncch, /netscr >20TB of NetApp NAS RAID array used for scratch space, mounted as /nas and /scr Shared by all login and compute nodes (cedar/cypress, happy/yatta, emerald) Mass Storage (MS) – for permanent storage Mounted for long term data storage on all scientific computing servers’ login nodes as ~/ms ($HOME/ms) Never run jobs in ~/ms (compute nodes do not have ~/ms access)

Subscription of Services
Have an ONYEN ID The Only Name You’ll Ever Need Eligibility: Faculty, staff, postdoc, and graduate students Go to

Access to Servers To Emerald ssh emerald.isis.unc.edu To cedar
ssh cedar.isis.unc.edu To Topsail ssh topsail.unc.edu

Programming Tools Compilers FORTRAN 77/90/95 C/C++ Utility Libraries
BLAS, LAPACK, FFTW, SCALAPACK IMSL, NAG, NetCDF, GSL, PETSc Parallel Computing OpenMP PVM MPI (MPICH, LAM/MPI, OpenMPI, MPICH2)

Compilers: SMP Machines
Cedar/Cypress – SGI Altix 3700, 128 CPUs 64-bit Intel Compiler versions 9.1 and 10.1, /opt/intel FORTRAN 77/90/95: ifort/ifc/efc C/C++: icc/ecc 64-bit GNU compilers FORTRAN 77 f77/g77 C and C++ gcc/cc and g++/c++ Yatta/P575 – IBM P690/P575, 32/64CPUs XL FORTRAN 77/ xlf, xlf90 C and C++ AIX xlc, xlC

Compilers: LINUX Cluster
Absoft ProFortran Compilers Package Name: profortran Current Version: 7.0 FORTRAN 77 (f77): Absoft FORTRAN 77 compiler version 5.0 FORTRAN 90/95 (f90/f95): Absoft FORTRAN 90/95 compiler version 3.0 GNU Compilers Package Name: gcc Current Version: 4.1.2 FORTRAN 77 (g77/f77): 3.4.3, 4.1.2 C (gcc): 3.4.3, 4.1.2 C++ (g++/c++): 3.4.3, 4.1.2 Intel Compilers Package Name: intel_fortran intel_CC Current Version: 10.1 FORTRAN 77/90 (ifc): Intel LINUX compiler version 8.1, 9.0, 10.1 CC/C++ (icc): Intel LINUX compiler version 8.1, 9.0, 10.1 Portland Group Compilers Package Name: pgi Current Version: 7.1.6 FORTRAN 77 (pgf77): The Portland Group, Inc. pgf77 v6.0, 7.0.4, 7.1.3 FORTRAN 90 (pgf90): The Portland Group, Inc. pgf90 v6.0, 7.0.4, 7.1.3 High Performance FORTRAN (pghpf): The Portland Group, Inc. pghpf v6.0, 7.0.4, 7.1.3 C (pgcc): The Portland Group, Inc. pgcc v6.0, 7.0.4, 7.1.3 C++ (pgCC): The Portland Group, Inc. v6.0, 7.0.4, 7.1.3

LINUX Compiler Benchmark
Absoft ProFortran 90 Intel FORTRAN 90 Portland Group FORTRAN 90 GNU FORTRAN 77 Molecular Dynamics (CPU time) 4.19 (4) 2.83 (2) 2.80 (1) 2.89 (3) Kepler (CPU Time) 0.49 (1) 0.93 (2) 1.10 (3) 1.24 (4) Linpack (CPU Time) 98.6 (4) 95.6 (1) 96.7 (2) 97.6 (3) Linpack (MFLOPS) 182.6 (4) 183.8 (1) 183.2 (3) 183.3 (2) LFK (CPU Time) 89.5 (4) 70.0 (3) 68.7 (2) 68.0 (1) LFK (MFLOPS) 309.7 (3) 403.0 (2) 468.9 (1) 250.9 (4) Total Rank 20 11 12 17 For reference only. Notice that performance is code and compilation flag dependent. For each benchmark, three identical runs were performed and the best CPU timing was chosen among the three and then listed in the Table. Optimization flags: for Absoft -O, Portland Group -O4 -fast, Intel -O3, GNU -O

Profilers & Debuggers SMP machines LINUX Cluster
Happy/yatta: dbx, prof, gprof Cedar/cypress: gprof LINUX Cluster PGI: pgdebug, pgprof, gprof Absoft: fx, xfx, gprof Intel: idb, gprof GNU: gdb, gprof

Utility Libraries Mathematic Libraries IMSL, NAG, etc.
Scientific Computing Linear Algebra BLAS, ATLAS EISPACK LAPACK SCALAPACK Fast Fourier Transform, FFTW BLAS/LAPACK, ScaLAPACK The GNU Scientific Library, GSL Utility Libraries, netCDF, PETSc, etc.

Utility Libraries SMP Machines
Yatta/P575: ESSL (Engineering and Scientific Subroutine Library), -lessl BLAS LAPACK EISPACK Fourier Transforms, Convolutions and Correlations, and Related Computations Sorting and Searching Interpolation Numerical Quadrature Random Number Generation Utilities

Utility Libraries 10/9/2008 SMP Machines Cedar/Cypress: MKL (Intel Math Kernel Library) 8.0, -L/opt/intel/mkl721/lib/64 -lmkl -lmkl_lapack -lsolver -lvml -lguide BLAS LAPACK Sparse Solvers FFT VML (Vector Math Library) Random-Number Generators Research Computing Center, ITS

Utility Libraries for Emerald Cluster
Mathematic Libraries IMSL The IMSL Libraries are a comprehensive set of mathematical and statistical functions From Visual Numerics, Functions include - Optimization - FFT’s - Interpolation - Differential equations - Correlation - Regression - Time series analysis - and many more Available in FORTRAN and C Package name: imsl Required compiler: Portland Group compiler, pgi Installed on AFS ISIS package space, /afs/isis/pkg/imsl Current default version 4.0, latest version 5.0 To subscribe IMSL, type “ipm add pgi imsl” To compiler a C code, code.c, using IMSL: pgcc -O $CFLAGS code.c -o code.x $LINK_CNL_STATIC

Mathematic Libraries NAG NAG produces and distributes numerical, symbolic, statistical, visualisation and simulation software for the solution of problems in a wide range of applications in such areas as science, engineering, financial analysis and research. From Numerical Algorithms Group, Functions include - Optimization - FFT’s - Interpolation - Differential equations - Correlation - Regression - Time series analysis - Multivariate factor analysis - Linear algebra - Random number generator Available in FORTRAN and C Package name: nag Available platform: SGI IRIX, SUN Solaris, IBM AIX, LINUX Installed on AFS ISIS package space, /afs/isis/pkg/nag Current default version 6.0 To subscribe IMSL, type “ipm add nag”

Scientific Libraries Linear Algebra BLAS, LAPACK, LAPACK90, LAPACK++, ATALS, SPARSE-BLAS, SCALAPACK, EISPACK, FFTPACK, LANCZOS, HOMPACK, etc. Source code downloadable from the website: Compiler dependent BLAS and LAPACK available for all 4 compiler at AFS ISIS package space, gcc, profortran, intel and pgi SCALAPACK available for pgi and intel compilers Assistance available if other versions are needed

Scientific Libraries Other Libraries: not fully implemented yet and thus please be cautious and patient when using them FFTW GSL NetCDF NCO HDF OCTAVE PETSc …… If you think more libraries are of broad interest, please recommend to us

Parallel Computing SMP Machines: OpenMP Compilation:
Use “-qsmp=omp” flag on happy Use “-openmp” flag on cedar Environmental Variable Setup setenv OMP_NUM_THREADS n MPI Use “-lmpi” flag on cedar Use MPI capable compilers, e.g., mpxlf, mpxlf90, mpcc, mpCC Hybrid (OpenMP and MPI): Do both!

Parallel Computing With Emerald Cluster
Setup MPI Implementation MPICH/MPICH2 MPI-LAM MPI Package to be “ipm add”-ed mpich mpi-lam Vendor\Programming Language F77 F90 C C++ F77 F90 C C++ GNU Compilers √ √ √ √ √ √ Absoft ProfFortran Compilers √ √ √ √ √ √ √ √ Portland Group Compilers √ √ √ √ √ √ √ √ Intel Compilers √ √ √ √ √ √ √ √

Setup Vendor \ Language Package Name FORTRAN 77 FORTRAN 90 C C++ GNU gcc g77 gcc g++ Absoft ProfFortran profortran f77 f95 Portland Group pgi pgf77 pgf90 pgcc pgCC Intel intel_fortran intel_CC ifc ifc icc icc Commands for Parallel MPI Compilation mpich or mpi-lam mpif77 mpif90 mpicc mpiCC

Setup AFS Packages to be “ipm add”-ed Notice the order: Compiler is always added first Add ONLY ONE compiler into your environment COMPILER MPICH/MPICH2 MPI-LAM GNU ipm add gcc mpich ipm add gcc mpi-lam Absoft ProFortran ipm add profortran mpich ipm add profortran mpi-lam Portland Group ipm add pgi mpich ipm add pgi mpi-lam Intel ipm add intel_fortran intel_CC mpich ipm add intel_fortran intel_CC mpi-lam

Compilation To compile an MPI Fortran 77 code, code.f, and to form an executable, exec %mpif77 -O -o exec code.f For a Fortran 90/95 code, code.f90, and to form an executable, exec %mpif90 -O -o exec code.f90 For a C code, code.c, and to form an executable, exec %mpicc -O -o exec code.c For a C++ code, code.cc, and to form an executable, exec %mpiCC -O -o exec code.cc

Scientific Packages Available in AFS package space
To subscribe a package, type “ipm add pkg_name” where “pkg_name is the name of the package. For example, “ipm add gaussian” To remove it, type “ipm remove pkg_name” All packages are installed at the /afs/isis/pkg/ directory. For example, /afs/isis/pkg/gaussian. Categories of scientific packages include: Quantum Chemistry Molecular Dynamics Material Science Visualization NMR Spectroscopy X-Ray Crystallography Bioinformatics Others

Scientific Package: Quantum Chemistry
Software Package Name Platforms Current Version Parallel ABINIT abinit LINUX 4.3.3 YES (MPI) ADF adf LINUX Yes (PVM) Cerius2 cerius2 LINUX 4.10 Yes (MPI) GAMESS gamess LINUX Yes (MPI) Gaussian gaussian LINUX 03E01 Yes (OpenMP) MacroModel macromodel IRIX 7.1 No MOLFDIR molfdir LINUX 2001 NO Molpro molpro LINUX 2006.6 Yes (MPI) NWChem nwchem LINUX 5.1 Yes (MPI) MaterialStudio materisalstudio LINUX 4.2 Yes (MPI) CPMD cpmd LINUX 3.9 YES (MPI) ACES2 aces2 LINUX 4.1.2 No

Scientific Package: Molecular Dynamics
Software Package Name Platforms Current Version Parallel Amber amber LINUX 9.1 MPI NAMD/VMD namd,vmd LINUX 2.5 MPI Gromacs gromcs LINUX 3.2.1 MPI InsightII insightII IRIX 2000.3 -- MacroModel macromodel IRIX 7.1 -- PMEMD pmemd LINUX 3.0.0 MPI Quanta quanta IRIX 2005 MPI Sybyl sybyl LINUX 7.1 -- CHARMM charmm LINUx 3.0B1 MPI TINKER tinker LINUX 4.2 -- O o LINUX 9.0.7 --

Molecular & Scientific Visualization
Software Package Name Platforms Current Version AVS avs LINUX 5.6 AVS Express Avs-express LINUX 6.2 Cerius2 cerius2 IRIX/LINUX 4.9 DINO dino IRIX 0.8.4 ECCE ecce LINUX 2.1 GaussView gaussian LINUX/AIX 4.0 GRASP grasp IRIX 1.3.6 InsightII insightII LINUX 2000.3 MOIL-VIEW Moil-view IRIX 9.1 MOLDEN molden LINUX 4.0 MOLKEL molkel IRIX 4.3 MOLMOL molmol LINUX 2K.1 MOLSCRIPT molscript IRIX 2.1.2 MOLSTAR molstar IRIX/LINUX 1.0

Molecular & Scientific Visualization
Software Package Name Platforms Current Version MOVIEMOL moviemol LINUX 1.3.1 NBOView nbo LINUX 5.0 QUANTA quanta IRIX/LINUX 2005 RASMOL rasmol IRIX/LINUX/AIX 2.7.3 RASTER3D raster3d IRIX/LINUX 2.7c SPARTAN spartan IRIX 5.1.3 SPOCK spock IRIX 1.7.0p1 SYBYL sybyl LINUX 7.1 VMD vmd LINUX 1.8.2 XtalView xtalview IRIX 4.0 XMGR xmgr LINUX 4.1.2 GRACE grace LINUX 5.1.2 IMAGEMAGICK Imagemagick IRIX/LINUX/AIX GIMP gimp IRIX/LINUX/AIX 1.0.2 XV xv IRIX/LINUX/AIX 3.1.0a

NMR & X-Ray Crystallography
Software Package Name Platforms Current Version CNSsolve cnssolve IRIX/LINUX 1.1 AQUA aqua IRIX/LINUX 3.2 BLENDER blender IRIX 2.28a BNP bnp IRIX/LINUX 0.99 CAMBRIDGE cambridge IRIX 5.26 CCP4 ccp4 IRIX/LINUX 4.2.2 CNX cns IRIX/LINUX 2002 FELIX felix IRIX/LINUX 2004 GAMMA gamma IRIX 4.1.0 MOGUL mogul IRIX/LINUX 1.0 Phoelix phoelix IRIX 1.2 TURBO turbo IRIX 5.5 XPLOR-NIH Xplor_nih IRIX/LINUX 2.11.2 XtalView xtalview IRIX 4.0

Scientific Package: Bioinformatics
Software Package Name Platforms Current Version BIOPERL bioperl IRIX 1.4.0 BLAST blast IRIX/LINUX 2.2.6 CLUSTALX clustalx IRIX 8.1 EMBOSS emboss IRIX 2.8.0 GCG gcg LINUX 11.0 Insightful Miner iminer IRIX 3.0 Modeller modeller IRIX/LINUX 7.0 PISE pise LINUX 5.0a SEAVIEW seaview IRIX/LINUX 1.0 AUTODOCK autodock IRIX 3.05 DOCK dock IRIX/LINUX 5.1.1 FTDOCK ftdock IRIX 1.0 HEX hex IRIX 2.4

Why do We Need Job Management Systems?
“Whose job you run in addition to when and where it is run, may be as important as how many jobs you run!” Effectively optimizes the utilization of resources Effectively optimizes the sharing of resources Often referred to as Resource Management Software, Queuing Systems, or Job Management System, etc.

Job Management Tools PBS - Portable Batch System
Open Source Product Developed at NASA Ames Research Center DQS - Distributed Queuing System Open Source Product Developed by SCRI at Florida State University LSF - Load Sharing Facility Commercial Product from Platform Computing, Already Deployed at UNC-CH ITS Computing Servers Codine/Sun Grid Engine Commercial Version of DQS from Gridware, Inc. Now owned by SUN. Condor A Restricted Source ‘Cycle Stealing’ Product From The University of Wisconsin Others Too Numerous To Mention

Operations of LSF Submission host Master host Execution host queue
other hosts Master host other hosts Execution host 3 LIM MLIM LIM Load information 4 2 5 SBD Batch API MBD 11 8 9 Child SBD 1 6 7 queue 10 12 bsub app RES LIM – Load Information Manager MLIM – Master LIM MBD – Master Batch Daemon SBD – Slave Batch Daemon RES – Remote Execution Server 13 User job

Common LSF Commands 10/9/2008 lsid A good choice of LSF command to start with is the lsid command lshosts/bhosts shows all of the nodes that the LSF system is aware of bsub submits a job interactively or in batch using LSF batch scheduling and queue layer of the LSF suite bjobs isplays information about a recently run job. You can use the –l option to view a more detailed accounting bqueues displays information about the batch queues. Again, the –l option will display a more thorough description bkill <job ID# > kill the job with job ID number of # bhist -l <job ID# > displays historical information about jobs. A “-a” flag can displays information about both finished and unfinished jobs bpeek -f <job ID#> displays the stdout and stderr output of an unfinished job with a job ID of #. bhpart displays information about host partitions bstop Suspend a unfinished jobs bswitch switches unfinished jobs from one queue to another Research Computing Center, ITS

More about LSF Type “jle” -- checks job efficiency
Type “bqueues” for all queues on one cluster/machine (-m); Type “bqueues -l queue_name” for more info about the queue named “queue_name” Type “busers” for user job slot limits Specific for Baobab: cpufree -- to check how many free/idle CPUs avaialble pending -- to check how many jobs are still pending bfree – to check how many free slots available “bfree –h”

LSF Queues Emerald Clusters
Description int Interactive jobs now Preemptive debugging queue, 10 min wall-clock limit, 2 CPUs week Default queue, one week wall-clock limit, up to 32 CPUs/user month Long-running serial-job queue, one month wall-clock limit, up to 4 jobs per user staff ITS Research Computing staff queue manager For use by LSF administrators

How to Submit Jobs via LSF on Emerald Clusters
Jobs to Interactive Queue bsub -q int -m cedar -Ip my_interactive_job Serial Jobs bsub -q week -m cypress my_batch_job Parallel OpenMP Jobs setenv OMP_NUM_THREADS 4 bsub -q week -n 4 -m cypress my_parallel_job Parallel MPI Jobs bsub -q week -n 4 -m cypress mpirun -np 4 my_parallel_job

Peculiars of Emerald Cluster
CPU Type Resources -R Parallel Job Submission esub -a Wrapper Xeon 2.4 GHz Xeon24, blade,… lammpi mpichp4 lammpirun_wrapper mpichp4_wrapper Xeon 2.8 GHz Xeon28, blade,… Xeon 3.2 GHz Xeo32, blade,… 16-Way IBM P575 p5aix,… Notice that -R and -a flags are mutually exclusive in one command line.

Run Jobs on Emerald LINUX Cluster
Interactive Jobs bsub -q int -R xeon28 -Ip my_interactive_job Syntax for submitting a serial job is: bsub -q queuename -R resources executable For example bsub -q week -R blade my_executable To run a MPICH parallel job on AMD Athlon machines with, say, 4 CPUs, bsub -q idle -n 4 -a mpichp4 mpirun.lsf my_par_job To run LAM/MPI parallel jobs on IBM BladeCenter machines with, say, 4 CPUs: bsub -q week -n 4 -a lammpi mpirun.lsf my_par_job

Final Friendly Reminders
Never run jobs on login nodes For file management, coding, compilation, etc., purposes only Never run jobs outside LSF Fair sharing Never run jobs on your AFS ISIS home or ~/ms. Instead, on /scr, /netscr, or /nas Slow I/O response, limited disk space Move your data to mass storage after jobs are finished and remove all temporary files on scratch disks Scratch disk not backed up, efficient use of limited resources Old files will automatically be deleted without notification

Online Resources Get started with Research Computing:
Programming Tools Scientific Packages Job Management Benchmarks High Performance Computing

Short Courses Introduction to Scientific Computing
Introduction to Emerald Introduction to Topsail LINUX: Introduction LINUX: Intermediate MPI for Parallel Computing OpenMP for Parallel Computing MATLAB: Introduction STATA: Introduction Gaussian and GaussView Introduction to Computational Chemistry Shell Scripting Introduction to Perl click “Current Schedule of ITS Workshops”

10/9/2008 Please direct comments/questions about research computing to Please direct comments/questions pertaining to this presentation to In order for create a section divider slide, add a new slide, select the “Title Only” Slide Layout and apply the “Section Divider” master from the Slide Design menu. For more information regarding slide layouts and slide designs, please visit Research Computing Center, ITS

Hands-on Exercises If you haven’t done so yet
Subscribe the Research Computing services Access via SecureCRT or X-Win32 to emerald, topsail, etc. Create a working directory for yourself on /netscr or /scr Get to know basic AFS, UNIX commands Get to know the Baobab Beowulf cluster Compile OpenMP codes on Emerald Compile serial and one parallel (MPI) codes on Emerald Get familiar with basic LSF commands Get to know available packages available in AFS space Submit jobs via LSF using serial or (OpenMP/MPI)parallel queues The WORD .doc format of this hands-on exercises is available here: /afs/isis/depts/its/public_html/divisions/rc/training/scientific/short_courses/labDirections_SciComp_2009.doc

Introduction to Scientific Computing

Similar presentations

Presentation on theme: "Introduction to Scientific Computing"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to Scientific Computing

Similar presentations

Presentation on theme: "Introduction to Scientific Computing"— Presentation transcript:

Similar presentations

About project

Feedback