Introduction to Scientific Computing on BU’s Linux Cluster Doug Sondak Linux Clusters and Tiled Display Walls Boston University July 30 – August 1, 2002.

Slides:

Advertisements

Similar presentations

Parallel Processing with OpenMP

Advertisements

Introduction to Openmp & openACC

© 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Job Submission.

Profiling your application with Intel VTune at NERSC

Using the Argo Cluster Paul Sexton CS 566 February 6, 2006.

Lecture 2 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.

Network for Computational Nanotechnology (NCN) Purdue, Norfolk State, Northwestern, UC Berkeley, Univ. of Illinois, UTEP Basic Portable Batch System (PBS)

HPCC Mid-Morning Break MPI on HPCC Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Research

Southgreen HPC system Concepts Cluster : compute farm i.e. a collection of compute servers that can be shared and accessed through a single “portal”

Software Tools Using PBS. Software tools Portland compilers pgf77 pgf90 pghpf pgcc pgCC Portland debugger GNU compilers g77 gcc Intel ifort icc.

Batch Queuing Systems The Portable Batch System (PBS) and the Load Sharing Facility (LSF) queuing systems share much common functionality in running batch.

Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.

IT MANAGEMENT OF FME, 21 ST JULY  THE HPC FACILITY  USING PUTTY AND WINSCP TO ACCESS THE SERVER  SENDING FILES TO THE SERVER  RUNNING JOBS 

High Performance Computing

Job Submission on WestGrid Feb on Access Grid.

Sun Grid Engine Grid Computing Assignment – Fall 2005 James Ruff Senior Department of Mathematics and Computer Science Western Carolina University.

Introduction to Scientific Computing Doug Sondak Boston University Scientific Computing and Visualization.

Introduction to Scientific Computing on Linux Clusters Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002.

Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes.

Linux Clusters and Tiled Display Walls July 30 - August 01, 2002 MPI Basics 1 of 43 MPI An Introduction Kadin Tseng Scientific Computing and Visualization.

1 SEEM3460 Tutorial Unix Introduction. 2 Introduction What is Unix? An operation system (OS), similar to Windows, MacOS X Why learn Unix? Greatest Software.

ISG We build general capability Purpose After this tutorial, you should: Be comfortable submitting work to the batch queuing system of olympus and be familiar.

ORIGINAL AUTHOR JAMES REINDERS, INTEL PRESENTED BY ADITYA AMBARDEKAR Overview for Intel Xeon Processors and Intel Xeon Phi coprocessors.

MIGRATING TO THE SHARED COMPUTING CLUSTER (SCC) SCV Staff Boston University Scientific Computing and Visualization.

High Performance Computation --- A Practical Introduction Chunlin Tian NAOC Beijing 2011.

Executing OpenMP Programs Mitesh Meswani. Presentation Outline Introduction to OpenMP Machine Architectures Shared Memory (SMP) Distributed Memory MPI.

Debugging Cluster Programs using symbolic debuggers.

A COMPARISON MPI vs POSIX Threads. Overview MPI allows you to run multiple processes on 1 host  How would running MPI on 1 host compare with POSIX thread.

 Accessing the NCCS Systems  Setting your Initial System Environment  Moving Data onto the NCCS Systems  Storing Data on the NCCS Systems  Running.

VIPBG LINUX CLUSTER By Helen Wang March 29th, 2013.

Bigben Pittsburgh Supercomputing Center J. Ray Scott

Debugging and Profiling GMAO Models with Allinea’s DDT/MAP Georgios Britzolakis April 30, 2015.

DDT Debugging Techniques Carlos Rosales Scaling to Petascale 2010 July 7, 2010.

MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator Department of Computer Science Iowa State University.

Introduction to Using SLURM on Discover Chongxun (Doris) Pan September 24, 2013.

Using the BYU Supercomputers. Resources Basic Usage After your account is activated: – ssh You will be logged in to an interactive.

Performance Optimization Getting your programs to run faster CS 691.

Network Queuing System (NQS). Controls batch queues Only on Cray SV1 Presently 8 queues available for general use and one queue for the Cray analyst.

Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,

1 Serial Run-time Error Detection and the Fortran Standard Glenn Luecke Professor of Mathematics, and Director, High Performance Computing Group Iowa State.

HPC for Statistics Grad Students. A Cluster Not just a bunch of computers Linked CPUs managed by queuing software – Cluster – Node – CPU.

CS 591x Profiling Parallel Programs Using the Portland Group Profiler.

Performance Optimization Getting your programs to run faster.

Roadrunner Supercluster University of New Mexico -- National Computational Science Alliance Paul Alsing.

1 Lattice QCD Clusters Amitoj Singh Fermi National Accelerator Laboratory.

How to for compiling and running MPI Programs. Prepared by Kiriti Venkat.

Software Tools Using PBS. Software tools Portland compilers pgf77 pgf90 pghpf pgcc pgCC Portland debugger GNU compilers g77 gcc Intel ifort icc.

Running Parallel Jobs Cray XE6 Workshop February 7, 2011 David Turner NERSC User Services Group.

Threaded Programming Lecture 2: Introduction to OpenMP.

Introduction to Pragnesh Patel 1 NICS CSURE th June 2015.

Portable Batch System – Definition and 3 Primary Roles Definition: PBS is a distributed workload management system. It handles the management and monitoring.

University of Illinois at Urbana-Champaign Using the NCSA Supercluster for Cactus NT Cluster Group Computing and Communications Division NCSA Mike Showerman.

Advanced topics Cluster Training Center for Simulation and Modeling September 4, 2015.

Cliff Addison University of Liverpool NW-GRID Training Event 26 th January 2007 SCore MPI Taking full advantage of GigE.

Debugging Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.

Introduction to Parallel Computing Presented by The Division of Information Technology Computer Support Services Department Research Support Group.

Wouter Verkerke, NIKHEF 1 Using ‘stoomboot’ for NIKHEF-ATLAS batch computing What is ‘stoomboot’ – Hardware –16 machines, each 2x quad-core Pentium = 128.

NREL is a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, operated by the Alliance for Sustainable.

Advanced Computing Facility Introduction

Specialized Computing Cluster An Introduction

PARADOX Cluster job management

HPC usage and software packages

Computational Physics (Lecture 17)

Hodor HPC Cluster LON MNG HPN Head Node Comp Node Comp Node Comp Node

Advanced TAU Commander

Paul Sexton CS 566 February 6, 2006

Compiling and Job Submission

Parallel Computing Explained How to Parallelize a Code

Quick Tutorial on MPICH for NIC-Cluster

Working in The IITJ HPC System

Presentation transcript:

Introduction to Scientific Computing on BU’s Linux Cluster Doug Sondak Linux Clusters and Tiled Display Walls Boston University July 30 – August 1, 2002

Outline hardware parallelization compilers batch system profilers Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Hardware Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

BU’s Cluster 52 2-processor nodes specifications –2 Pentium III processors per node –1 GHz –1 GB memory per node –32 KB L1 cache per CPU –256 KB L2 cache per CPU Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

BU’s Cluster (2) Myrinet 2000 interconnects –sustained 1.96 Gb/s Linux Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Some Timings CFD code, MPI, 4 procs. Origin SP 329 Cluster, 2 procs. per box 174 Cluster, 1 proc. per box 153 Regatta 78 MachineSec.

Parallelization Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Parallelization MPI is the recommended method –PVM may also be used some MPI tutorials –Boston University –NCSA Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Parallelization (2) OpenMP is available for SMP within a node mixed MPI/OpenMP not presently available –we’re working on it! Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Compilers Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Compilers Portland Group –pgf77 –pgf90 –pgcc –pgCC Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Compilers (2) gnu –g77 –gcc –g++ Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Compilers (3) Intel –Fortran ifc –C/C++ icc Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Compilers (2) Polyhedron F77 Benchmarks PG gnu Intel AC ADI AIR CHESS DODUC LP MDB MOLENR PI PNPOLY RO TFFT

Compilers (3) Portland Group –pgf77 generally faster than g77 Intel –ifc generally faster than pgf77 Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Compilers (4) Linux C/C++ compilers –gcc/g++ seems to be the standard, usually described as a good compiler Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Portland Group -O2 –highest level of optimization -fast –same as - O2 -Munroll -Mnoframe - Minline –function inlining Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Portland Group (2) -Mbyteswapio –swaps between big endian and little endian –useful for using files created on our SP, Regatta, or Origin2000 -Ktrap=fp –trap floating point invalid operation, divide by zero, or overflow –slows code down, only use for debugging Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Portland Group (3) - Mbounds –array bounds checking –slows code down, only use for debugging -mp –process OpenMP directives -Mconcur –automatic SMP parallelization Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Intel Need to set some environment variables –contained in /usr/local/IT/intel6.0/compiler60/ia32/bin/iccvars.csh –source this file, copy it into your.cshrc file, or source it in.cshrc –there’s an identical file called ifcvars.csh to avoid (create?) confusion Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Intel (2) -O3 –highest level of optimization -ipo –interprocedural optimization - unroll –loop unrolling Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Intel (3) - openmp -fpp –process OpenMP directives -parallel –automatic SMP parallelization -CB –array bounds checking Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Intel (3) -CU –check for use of uninitialized variables Endian conversion by way of environment variables setenv F_UFMTENDIAN big all reads will be converted from big to little endian, all writes from little to big endian Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Intel (4) Can specify units for endian conversion setenv F_UFMTENDIAN big:10,20 Can mix endian conversions setenv F_UFMTENDIAN little;big:10,20 all units are little endian except for 10 and 20, which wil be converted Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Batch System Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Batch System PBS –different than LSF on O2k’s, SP’s, Regattas there’s only one queue dque Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

qsub job submission done through script –script details will follow qsub scriptname returns job ID in working directory –std. out - scriptname.ojobid –std. err - scriptname.ejobid run]$ qsub corrun 808.hn003.nerf.bu.edu

qstat Check status of all your jobs qstat lies about run time –often (always?) zero run]$ qstat Job id Name User Time Use S Queue hn003 corrun sondak 0 R dque

qstat (2) S - job status –Q - queued –R - running –E - exiting (finishing up) qstat -f gives detailed status exec_host = nodem019/0+nodem018/0 +nodem017/0+nodem016/0 to specify jobid qstat jobid

Other PBS Commands kill job qdel jobid some less-important PBS commands –qalter, qhold, qrls, qmsg, qrerun –man pages are available for all commands Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

PBS Script For serial runs #!/bin/bash # Set the default queue #PBS -q dque # ppn is cpu's per node #PBS -l nodes=1:ppn=1,walltime=00:30:00 cd $PBS_O_WORKDIR myrun

PBS/MPI For MPI, set up gmi file in PBS script test -d ~/.gmpi || mkdir ~/.gmpi GMCONF=~/.gmpi/conf.$PBS_JOBID /usr/local/xcat/bin/pbsnodefile2gmconf $PBS_NODEFILE > $GMCONF cd $PBS_O_WORKDIR NP=$(head -1 $GMCONF) Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

PBS/MPI (2) To run MPI, end PBS script with (all on one line) mpirun.ch_gm --gm-f $GMCONF --gm-recv polling --gm-use-shmem --gm-kill 5 -np $NP PBS_JOBID=$PBS_JOBID myprog Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

PBS/MPI (3) mpirun.ch_gm –version of mpirun that uses myrinet --gm-f $GMCONF –access configuration file constructed above --gm-recv polling –poll continually to check for completion of sends and receives –most efficient for dedicated procs. That’s us!

PBS/MPI (4) --gm-use-shmem –enable shared-memory support –may improve or degrade performance –try your code with and without it --gm-kill 5 –if one MPI process aborts, kill others after 5 sec. Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

PBS/MPI (5) -np $NP –run on NP procs as computed earlier in script –equals “nodes x ppn” from PBS -l option PBS_JOBID=$PBS_JOBID –seems redundant redundant –do it anyway myprog – run the darn code already!

Profiling Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Portland Group Portland Group Compiler flag –function level -Mprof=func –line level -Mprof=lines much larger file creates pgprof.out file in working directory Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

PG (2) At unix prompt, type pgprof command will pop up window with bar chart of timing results can take file name argument in case you’ve renamed the pgprof.out file pgprof pgprof.lines Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

PG (3) option to specify source directory pgprof - I sourcedir pgprof.lines –can specify multiple directories with multiple - I flags also can use GUI menu –Options Source Directory... Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

PG (4)

PG (5) Calls - number of times routine was called Time - time spent in specified routine Cost - time spent in specified routine plus time spent in called routines Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

PG (6) Lines profiling –with optimization, may not be able to identify many (most?) lines in source code reports results for blocks of code, e.g., loops –without optimization, doesn’t measure what you really want –initial screen looks like “func” screen –double-click function/subroutine name to get line-level listing

PG (7)

Questions/Comments Feel free to contact us directly with questions about the cluster or parallelization/optimization issues Doug Kadin Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002