Running Parallel Jobs Cray XE6 Workshop February 7, 2011 David Turner NERSC User Services Group.

Slides:

Advertisements

Similar presentations

Running Jobs on Franklin Richard Gerber NERSC User Services NERSC User Group Meeting September 19, 2007.

Advertisements

Profiling your application with Intel VTune at NERSC

Using the Argo Cluster Paul Sexton CS 566 February 6, 2006.

Network for Computational Nanotechnology (NCN) Purdue, Norfolk State, Northwestern, UC Berkeley, Univ. of Illinois, UTEP Basic Portable Batch System (PBS)

Southgreen HPC system Concepts Cluster : compute farm i.e. a collection of compute servers that can be shared and accessed through a single “portal”

Software Tools Using PBS. Software tools Portland compilers pgf77 pgf90 pghpf pgcc pgCC Portland debugger GNU compilers g77 gcc Intel ifort icc.

Using Clusters -User Perspective. Pre-cluster scenario So many different computers: prithvi, apah, tejas, vayu, akash, agni, aatish, falaq, narad, qasid.

Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.

VIPBG LINUX CLUSTER By Helen Wang Sept. 10, 2014.

OSCAR Jeremy Enos OSCAR Annual Meeting January 10-11, 2002 Workload Management.

Introduction to TAMNUN server and basics of PBS usage Yulia Halupovich CIS, Core Systems Group.

IT MANAGEMENT OF FME, 21 ST JULY  THE HPC FACILITY  USING PUTTY AND WINSCP TO ACCESS THE SERVER  SENDING FILES TO THE SERVER  RUNNING JOBS 

Parallel Systems Parallel Systems Tools Dr. Guy Tel-Zur.

High Performance Computing Systems for IU Researchers – An Introduction IUB Wells Library 10-Sep-2012 Jenett Tillotson George Turner

Introduction to HPC Workshop October Introduction Rob Lane HPC Support Research Computing Services CUIT.

Introduction to Running CFX on U2  Introduction to the U2 Cluster  Getting Help  Hardware Resources  Software Resources  Computing Environment  Data.

High Performance Computing

Job Submission on WestGrid Feb on Access Grid.

Using the BYU Supercomputers. Resources Basic Usage After your account is activated: – ssh ssh.fsl.byu.edu You will be logged in to an interactive node.

Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes.

JGI/NERSC New Hardware Training Kirsten Fagnan, Seung-Jin Sul January 10, 2013.

Critical Flags, Variables, and Other Important ALCF Minutiae Jini Ramprakash Technical Support Specialist Argonne Leadership Computing Facility.

The Cray XE6 Beagle Beagle Team Computation Institute.

Getting started on the Cray XE6 Beagle Beagle Team Computation.

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

VIPBG LINUX CLUSTER By Helen Wang March 29th, 2013.

Bigben Pittsburgh Supercomputing Center J. Ray Scott

Introduction to Using SLURM on Discover Chongxun (Doris) Pan September 24, 2013.

Using the BYU Supercomputers. Resources Basic Usage After your account is activated: – ssh You will be logged in to an interactive.

Network Queuing System (NQS). Controls batch queues Only on Cray SV1 Presently 8 queues available for general use and one queue for the Cray analyst.

Using the BYU SP-2. Our System Interactive nodes (2) –used for login, compilation & testing –marylou10.et.byu.edu I/O and scheduling nodes (7) –used for.

HPC for Statistics Grad Students. A Cluster Not just a bunch of computers Linked CPUs managed by queuing software – Cluster – Node – CPU.

High Performance Computing on Flux EEB 401 Charles J Antonelli Mark Champe LSAIT ARS September, 2014.

Software Tools Using PBS. Software tools Portland compilers pgf77 pgf90 pghpf pgcc pgCC Portland debugger GNU compilers g77 gcc Intel ifort icc.

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

Introduction to HPC Workshop October Introduction Rob Lane & The HPC Support Team Research Computing Services CUIT.

Portable Batch System – Definition and 3 Primary Roles Definition: PBS is a distributed workload management system. It handles the management and monitoring.

Advanced topics Cluster Training Center for Simulation and Modeling September 4, 2015.

Cliff Addison University of Liverpool NW-GRID Training Event 26 th January 2007 SCore MPI Taking full advantage of GigE.

Wouter Verkerke, NIKHEF 1 Using ‘stoomboot’ for NIKHEF-ATLAS batch computing What is ‘stoomboot’ – Hardware –16 machines, each 2x quad-core Pentium = 128.

NREL is a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, operated by the Alliance for Sustainable.

CFI 2004 UW A quick overview with lots of time for Q&A and exploration.

An Brief Introduction Charlie Taylor Associate Director, Research Computing UF Research Computing.

Advanced Computing Facility Introduction

Hackinars in Bioinformatics

GRID COMPUTING.

Specialized Computing Cluster An Introduction

Auburn University

Welcome to Indiana University Clusters

PARADOX Cluster job management

Assumptions What are the prerequisites? … The hands on portion of the workshop will be on the command-line. If you are not familiar with the command.

HPC usage and software packages

How to use the HPCC to do stuff

BIOSTAT LINUX CLUSTER By Helen Wang October 29, 2015.

Hodor HPC Cluster LON MNG HPN Head Node Comp Node Comp Node Comp Node

Architecture & System Overview

CommLab PC Cluster (Ubuntu OS version)

BIMSB Bioinformatics Coordination

Postdoctoral researcher Department of Environmental Sciences, LSU

Welcome to our Nuclear Physics Computing System

Paul Sexton CS 566 February 6, 2006

Introduction to HPC Workshop

Introduction to TAMNUN server and basics of PBS usage

Compiling and Job Submission

CCR Advanced Seminar: Running CPLEX Computations on the ISE Cluster

Welcome to our Nuclear Physics Computing System

Advanced Computing Facility Introduction

Queueing System Peter Wad Sackett.

Quick Tutorial on MPICH for NIC-Cluster

Working in The IITJ HPC System

Presentation transcript:

Running Parallel Jobs Cray XE6 Workshop February 7, 2011 David Turner NERSC User Services Group

6,384 nodes (153,216 cores) –6000 nodes have 32 GB; 384 have 64 GB Small, fast Linux OS –Limited number of system calls and Linux commands –No shared objects by default Can support “.so” files with appropriate environment variable settings Smallest allocatable unit –Not shared Hopper Compute Nodes

8 nodes (128 cores) –4 quad-core AMD 2.4 GHz processers –128 GB –Full Linux OS Arbitrary placement upon login –Load balanced via number of connections Edit, compile, submit –No MPI Shared among many users –CPU and memory limits Hopper Login Nodes

24 nodes –4 quad-core AMD 2.4 GHz processers –32 GB Launch and manage parallel applications on compute nodes Commands in batch script are executed on MOM nodes No user (ssh) logins Hopper MOM Nodes

$HOME Tuned for small files $SCRATCH Tuned for large streaming I/O $PROJECT Sharing between people/systems By request only File Systems

% cc hello.c %./a.out Hello, world! Login nodes are not intended for computation! No MPI! Running on Login Nodes

Requires two components –Batch System Based on PBS –Moab scheduler –Torque resource manager qsub command Many monitoring methods –qs, qstat, showq, NERSC website, … –Application Launcher aprun command –Similar to mpirun/mpiexec How to Access Compute Nodes

% cat myjob.pbs #PBS -l walltime=00:10:00 #PBS -l mppwidth=48 #PBS -q debug cd $PBS_O_WORKDIR aprun –n 48./a.out % qsub myjob.pbs sdb Basic qsub Usage

Submit Queue Execute Queue Max Nodes Max Cores Max Walltime interactive 2566,14430 mins debug 51212,28830 mins regular reg_short51212,2886 hrs reg_small51212,28812 hrs reg_med4,09698,30412 hrs reg_big6,384153,21612 hrs low 51212,2886 hrs Batch Queues

-l walltime=hh:mm:ss -l mppwidth=num_cores –Determines number of nodes to allocate; should be a multiple of 24 -l mpplabels=bigmem –Will probably have to wait for bigmem nodes to become available -q queue_name Batch Options

-N job_name -o output_file -e error_file -j oe –Join output and error files Batch Options

-V –Propagate environment to batch job -A repo_name –Specify non-default repository -m [a|b|e|n] – notification –abort/begin/end/never Batch Options

% qsub –I –V -l walltime=00:10:00 -l mppwidth=48 -q interactive qsub: waiting for job sdb to start qsub: job sdb ready % cd $PBS_O_WORKDIR % aprun –n 48./a.out Running Interactively

Packed –User process on every core of each node –One node might have unused cores –Each process can safely access ~1.25 GB Unpacked –Increase per-process available memory –Allow multi-threaded processes Packed vs Unpacked

#PBS -l mppwidth=1024 aprun –n 1024./a.out Requires 43 nodes –42 nodes with 24 processes –1 node with 16 processes 8 cores unused –Could have specified mppwidth=1032 Packed

#PBS -l mppwidth=2048 aprun –n 1024 –N 12./a.out Requires 86 nodes –85 nodes with 12 processes –1 node with 4 processes 20 cores unused –Could have specified mppwidth=2064 –Each process can safely access ~2.5 GB Unpacked

qsub job_script qdel job_id qhold job_id qrls job_id qalter new_options job_id qmove new_queue job_id Manipulating Batch Jobs

qstat –a [-u username] –All jobs, in submit order qstat –f job_id –Full report, many details showq –All jobs, in priority order qs [-w] [-u username] –NERSC wrapper, priority order apstat, showstart, checkjob, xtnodestat Monitoring Batch Jobs

/project/projectdirs/training/XE6-feb-2011/RunningParallel jacobi_mpi.f90 jacobi.pbs indata mmsyst.f mmsyst.pbs Hands-On