Flux for PBS Users HPC 105 Dr. Charles J Antonelli LSAIT ARS August, 2013.

Slides:

Advertisements

Similar presentations

Operating System.

Advertisements

Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014.

High Performance Computing Workshop HPC 101 Dr. Charles J Antonelli LSAIT ARS February, 2014.

CCPR Workshop Lexis Cluster Introduction October 19, 2007 David Ash.

Software Tools Using PBS. Software tools Portland compilers pgf77 pgf90 pghpf pgcc pgCC Portland debugger GNU compilers g77 gcc Intel ifort icc.

Introduction to HPC Workshop October Introduction Rob Lane HPC Support Research Computing Services CUIT.

Job Submission on WestGrid Feb on Access Grid.

Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes.

High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.

HPCC Mid-Morning Break Interactive High Performance Computing Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.

Types of Operating System

Research Computing with Newton Gerald Ragghianti Newton HPC workshop Sept. 3, 2010.

Introduction to the Linux Command Line for High-Performance Computing Dr. Charles J Antonelli LSAIT ARS October, 2014.

 Accessing the NCCS Systems  Setting your Initial System Environment  Moving Data onto the NCCS Systems  Storing Data on the NCCS Systems  Running.

Operating System. Architecture of Computer System Hardware Operating System (OS) Programming Language (e.g. PASCAL) Application Programs (e.g. WORD, EXCEL)

Process Management. Processes Process Concept Process Scheduling Operations on Processes Interprocess Communication Examples of IPC Systems Communication.

Advanced High Performance Computing Workshop HPC 201 Dr. Charles J Antonelli LSAIT RSG October 30, 2013.

Introduction to HPC resources for BCB 660 Nirav Merchant

Bigben Pittsburgh Supercomputing Center J. Ray Scott

17-April-2007 High Performance Computing Basics April 17, 2007 Dr. David J. Haglin.

Lab System Environment

High Performance Computing Workshop HPC 101 Dr. Charles J Antonelli LSAIT ARS June, 2014.

High Performance Computing: Technologies and Opportunities Dr. Charles J Antonelli LSAIT ARS May, 2013.

Invitation to Computer Science 5 th Edition Chapter 6 An Introduction to System Software and Virtual Machine s.

Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.

HPC for Statistics Grad Students. A Cluster Not just a bunch of computers Linked CPUs managed by queuing software – Cluster – Node – CPU.

High Performance Computing on Flux EEB 401 Charles J Antonelli Mark Champe LSAIT ARS September, 2014.

Introduction to the Linux Command Line for High-Performance Computing Dr. Charles J Antonelli LSAIT ARS May, 2014.

Using the Weizmann Cluster Nov Overview Weizmann Cluster Connection Basics Getting a Desktop View Working on cluster machines GPU For many more.

Running Parallel Jobs Cray XE6 Workshop February 7, 2011 David Turner NERSC User Services Group.

High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013.

Introduction to HPC Workshop October Introduction Rob Lane & The HPC Support Team Research Computing Services CUIT.

Remote & Collaborative Visualization. TACC Remote Visualization Systems Longhorn – Dell XD Visualization Cluster –256 nodes, each with 48 GB (or 144 GB)

Portable Batch System – Definition and 3 Primary Roles Definition: PBS is a distributed workload management system. It handles the management and monitoring.

Advanced topics Cluster Training Center for Simulation and Modeling September 4, 2015.

Getting Started: XSEDE Comet Shahzeb Siddiqui - Software Systems Engineer Office: 222A Computer Building Institute of CyberScience May.

Debugging Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.

Wouter Verkerke, NIKHEF 1 Using ‘stoomboot’ for NIKHEF-ATLAS batch computing What is ‘stoomboot’ – Hardware –16 machines, each 2x quad-core Pentium = 128.

NREL is a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, operated by the Alliance for Sustainable.

ATLAS Computing Wenjing Wu outline Local accounts Tier3 resources Tier2 resources.

Parallel Programming Workshop HPC 470 August, 2015.

Advanced High Performance Computing Workshop HPC 201 Mark Champe, LSAIT ARS Dr. Alexander Gaenko, ARC-TS Seth Meyer, ITS CS February, 2016.

Advanced High Performance Computing Workshop HPC 201 Dr Charles J Antonelli, LSAIT ARS Mark Champe, LSAIT ARS Dr Alexander Gaenko, ARC-TS Seth Meyer, ITS.

A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.

INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.

CFI 2004 UW A quick overview with lots of time for Q&A and exploration.

An Brief Introduction Charlie Taylor Associate Director, Research Computing UF Research Computing.

Advanced Computing Facility Introduction

GRID COMPUTING.

Specialized Computing Cluster An Introduction

Welcome to Indiana University Clusters

PARADOX Cluster job management

Introduction to Flux (Statistics)

HPC usage and software packages

Operating System.

How to use the HPCC to do stuff

Process Management Presented By Aditya Gupta Assistant Professor

Hodor HPC Cluster LON MNG HPN Head Node Comp Node Comp Node Comp Node

Architecture & System Overview

Postdoctoral researcher Department of Environmental Sciences, LSU

File Transfer Olivia Irving and Cameron Foss

Welcome to our Nuclear Physics Computing System

College of Engineering

CCR Advanced Seminar: Running CPLEX Computations on the ISE Cluster

Welcome to our Nuclear Physics Computing System

Advanced Computing Facility Introduction

Introduction to High Performance Computing Using Sapelo2 at GACRC

Quick Tutorial on MPICH for NIC-Cluster

Working in The IITJ HPC System

Division of Engineering Computing Services

Presentation transcript:

Flux for PBS Users HPC 105 Dr. Charles J Antonelli LSAIT ARS August, 2013

Flux Flux is a university-wide shared computational discovery / high-performance computing service. Interdisciplinary Provided by Advanced Research Computing at U-M (ARC) Operated by CAEN HPC Hardware procurement, software licensing, billing support by U-M ITS Used across campus Collaborative since 2010 Advanced Research Computing at U-M (ARC) College of Engineering’s IT Group (CAEN) Information and Technology Services Medical School College of Literature, Science, and the Arts School of Information 8/13cja

The Flux cluster … 8/13cja 20133

Flux node 12 Intel cores 48 GB RAM Local disk EthernetInfiniBand 8/13cja 20134

Flux Large Memory node 1 TB RAM Local disk EthernetInfiniBand 8/13cja Intel cores

Flux hardware 8,016 Intel cores200 Intel Large Memory cores 632 Flux nodes5 Flux Large Memory nodes 48/64 GB RAM/node1 TB RAM/ Large Memory node 4 GB RAM/core (allocated)25 GB RAM/Marge Memory core 4X Infiniband network (interconnects all nodes) 40 Gbps, <2 us latency Latency an order of magnitude less than Ethernet Lustre Filesystem Scalable, high-performance, open Supports MPI-IO for MPI jobs Mounted on all login and compute nodes 5/13ES136

Flux software Licensed software et al Compilers & Libraries: Intel, PGI, GNU OpenMPOpenMPI 8/13cja 20137

Using Flux Three basic requirements to use Flux: 1.A Flux account 2.An MToken (or a Software Token) 3.A Flux allocation 8/13cja 20138

Using Flux 1.A Flux account Allows login to the Flux login nodes Develop, compile, and test code Available to members of U-M community, free Get an account by visiting 8/13cja 20139

Flux Account Policies To qualify for a Flux account: You must have an active institutional role On the Ann Arbor campus Not a Retiree or Alumni role Your uniqname must have a strong identity type Not a friend account You must be able to receive sent to You must have run a job in the last 13 months 8/13cja

Using Flux 2.An MToken (or a Software Token) Required for access to the login nodes Improves cluster security by requiring a second means of proving your identity You can use either an MToken or an application for your mobile device (called a Software Token) for this Information on obtaining and using these tokens at 8/13cja

Using Flux 3.A Flux allocation Allows you to run jobs on the compute nodes Current rates: (through June 30, 2016) $18 per core-month for Standard Flux $18 per core-month for Standard Flux $24.35 per core-month for Large Memory Flux $8 cost-share per core-month for LSA, Engineering, and Medical School Details at services/flux/flux-pricing/ services/flux/flux-pricing/ services/flux/flux-pricing/ To inquire about Flux allocations please flux- flux- 8/13cja

Flux Allocations To request an allocation send to flux- with flux- the type of allocation desired Regular or Large-Memory the number of cores needed the start date and number of months for the allocation the shortcode for the funding source the list of people who should have access to the allocation the list of people who can change the user list and augment or end the allocations edu/resources-services/flux/managing-a-flux-project/ edu/resources-services/flux/managing-a-flux-project/ 8/13cja

Flux Allocations An allocation specifies resources that are consumed by running jobs Explicit core count Implicit memory usage (4 or 25 GB per core) When any resource fully in use, new jobs are blocked An allocation may be ended early On the monthly anniversary You may have multiple active allocations Jobs draw resources from all active allocations 8/13cja

lsa_flux Allocation LSA funds a shared allocation named lsa_flux Usable by anyone in the College 60 cores For testing, experimentation, exploration Not for production runs Each user limited to 30 concurrent jobs support/support-for-users/lsa_flux support/support-for-users/lsa_flux 8/13cja

Monitoring Allocations Visit px px px Select your allocation from the list at upper left You’ll see all allocations you can submit jobs against Four sets of outputs Allocation details (start & end date, cores, shortcode) Financial overview (cores allocated vs. used, by month) Usage summary table (core-months by user and month Drill down for individual job run data Usage charts (by user) Details & screenshots: services/flux/check-my-flux-allocation/ services/flux/check-my-flux-allocation/ services/flux/check-my-flux-allocation/ 8/13cja

Storing data on Flux Lustre filesystem mounted on /scratch on all login, compute, and transfer nodes 640 TB of short-term storage for batch jobs Pathname depends on your allocation and uniqname e.g., /scratch/lsa_flux/cja Can share through U NIX groups Large, fast, short-term Data deleted 60 days after allocation expires NFS filesystems mounted on /home and /home2 on all nodes 80 GB of storage per user for development & testing Small, slow, long-term 8/13cja

Storing data on Flux Flux does not provide large, long-term storage Alternatives: LSA Research Storage ITS Value Storage Departmental server CAEN HPC can mount your storage on the login nodes Issue df -kh command on a login node to see what other groups have mounted 8/13cja

Storing data on Flux LSA Research Storage 2 TB of secure, replicated data storage Available to each LSA faculty member at no cost Additional storage available at $30/TB/yr Turn in existing storage hardware for additional storage Request by visiting rage%20Space/NewForm.aspx?RootFolder= rage%20Space/NewForm.aspx?RootFolder rage%20Space/NewForm.aspx?RootFolder Authenticate with Kerberos login and password Select NFS as the method for connecting to your storage 8/13cja

Copying data to Flux Using the transfer host: rsync -avz /your/cluster1/directory flux- xfer.engin.umich.edu:newdirname rsync -avz /your/cluster1/directory flux- xfer.engin.umich.edu:/scratch/youralloc/youru niqname Or use scp, sftp, WinSCP, Cyberduck, FileZilla 8/13cja

Globus Online Features High-speed data transfer, much faster than SCP or SFTP Reliable & persistent Minimal client software: Mac OS X, Linux, Windows GridFTP Endpoints Gateways through which data flow Exist for XSEDE, OSG, … UMich: umich#flux, umich#nyx Add your own server endpoint: contact Add your own client endpoint! More information 8/13cja

Connecting to Flux ssh flux-login.engin.umich.edu Login with token code, uniqname, and Kerberos password You will be randomly connected a Flux login node Currently flux-login1 or flux-login2 Do not run compute- or I/O-intensive jobs here Processes killed automatically after 30 minutes Firewalls restrict access to flux-login. To connect successfully, either Physically connect your ssh client platform to the U-M campus wired or MWireless network, or Use VPN software on your client platform, or Use ssh to login to an ITS login node (login.itd.umich.edu), and ssh to flux-login from there 8/13cja

Lab 1 Task: Use the multicore package The multicore package allows you to use multiple cores on the same node module load R Copy sample code to your login directory cd cp ~cja/hpc-sample-code.tar.gz. tar -zxvf hpc-sample-code.tar.gz cd./hpc-sample-code Examine Rmulti.pbs and Rmulti.R Edit Rmulti.pbs with your favorite Linux editor Change #PBS -M address to your own 8/13cja

Lab 1 Task: Use the multicore package Submit your job to Flux qsub Rmulti.pbs Watch the progress of your job qstat -u uniqname where uniqname is your own uniqname When complete, look at the job’s output less Rmulti.out 8/13cja

Lab 2 Task: Run an MPI job on 8 cores Compile c_ex05 cd ~/cac-intro-code make c_ex05 Edit file run with your favorite Linux editor Change #PBS -M address to your own I don’t want Brock to get your ! Change #PBS -A allocation to FluxTraining_flux, or to your own allocation, if desired Change #PBS -l allocation to flux Submit your job qsub run 8/13cja

PBS resources (1) A resource ( -l ) can specify: Request wallclock (that is, running) time -l walltime=HH:MM:SS Request C MB of memory per core -l pmem=Cmb Request T MB of memory for entire job -l mem=Tmb Request M cores on arbitrary node(s) -l procs=M Request a token to use licensed software -l gres=stata:1 -l gres=matlab -l gres=matlab%Communication_toolbox 8/13cja

PBS resources (2) A resource ( -l ) can specify: For multithreaded code: Request M nodes with at least N cores per node -l nodes=M:ppn=N Request M cores with exactly N cores per node (note the difference vis a vis ppn syntax and semantics!) -l nodes=M,tpn=N (you’ll only use this for specific algorithms) 8/13cja

Interactive jobs You can submit jobs interactively: qsub -I -V -l procs=2 -l walltime=15:00 -A youralloc_flux -l qos=flux –q flux This queues a job as usual Your terminal session will be blocked until the job runs When it runs, you will be connected to one of your nodes Invoked serial commands will run on that node Invoked parallel commands (e.g., via mpirun ) will run on all of your nodes When you exit the terminal session your job is deleted Interactive jobs allow you to Test your code on cluster node(s) Execute GUI tools on a cluster node with output on your local platform’s X server Utilize a parallel debugger interactively 8/13cja

Lab 3 Task: compile and execute an MPI program on a compute node Copy sample code to your login directory: cd cp ~brockp/cac-intro-code.tar.gz. tar -xvzf cac-intro-code.tar.gz cd./cac-intro-code Start an interactive PBS session qsub -I -V -l procs=2 -l walltime=30:00 -A FluxTraining_flux -l qos=flux -q flux On the compute node, compile & execute MPI parallel code: cd $PBS_O_WORKDIR mpicc -O3 -ipo -no-prec-div -xHost -o c_ex01 c_ex01.c mpirun -np 2./c_ex01 8/13cja

Lab 4 Task: Run Matlab interactively module load matlab Start an interactive PBS session qsub -I -V -l procs=2 -l walltime=30:00 -A FluxTraining_flux -l qos=flux -q flux Run Matlab in the interactive PBS session matlab -nodisplay 8/13cja

The Scheduler (1/3) Flux scheduling policies: The job’s queue determines the set of nodes you run on flux, fluxm The job’s account determines the allocation to be charged If you specify an inactive allocation, your job will never run The job’s resource requirements help determine when the job becomes eligible to run If you ask for unavailable resources, your job will wait until they become free There is no pre-emption 8/13cja

The Scheduler (2/3) Flux scheduling policies : If there is competition for resources among eligible jobs in the allocation or in the cluster, two things help determine when you run: How long you have waited for the resource How much of the resource you have used so far This is called “fairshare” The scheduler will reserve nodes for a job with sufficient priority This is intended to prevent starving jobs with large resource requirements 8/13cja

The Scheduler (3/3) Flux scheduling policies : If there is room for shorter jobs in the gaps of the schedule, the scheduler will fit smaller jobs in those gaps This is called “backfill” Cores Time 8/13cja

Job monitoring There are several commands you can run to get some insight over your jobs’ execution: freenodes : shows the number of free nodes and cores currently available mdiag -a youralloc_name : shows resources defined for your allocation and who can run against it showq -w acct=yourallocname : shows jobs using your allocation (running/idle/blocked) checkjob jobid : Can show why your job might not be starting showstart -e all jobid : Gives you a coarse estimate of job start time; use the smallest value returned 8/13cja

Job Arrays Submit copies of identical jobs Submit copies of identical jobs Invoked via qsub –t: Invoked via qsub –t: qsub –t array-spec pbsbatch.txt Where array-spec can be m-na,b,cm-n%slotlimite.g. qsub –t 1-50%10Fifty jobs, numbered 1 through 50, only ten can run simultaneously $PBS_ARRAYID records array identifier $PBS_ARRAYID records array identifier 35 cja 20138/1335

Dependent scheduling Submit jobs whose execution scheduling depends on other jobs Submit jobs whose execution scheduling depends on other jobs Invoked via qsub –W: Invoked via qsub –W: qsub -W depend=type:jobid[:jobid]… Where depend can be afterSchedule after jobids have started afterokSchedule after jobids have finished, only if no errors afternotokSchedule after jobids have finished, only if errors afteranySchedule after jobids have finished, regardless of status Inverted semantics for before,beforeok,beforenotok,beforeany 36 cja 20138/1336

Some Flux Resources U-M Advanced Research Computing Flux pages CAEN HPC Flux pages CAEN HPC YouTube channel For assistance: Read by a team of people including unit support staff Cannot help with programming questions, but can help with operational Flux and basic usage questions 8/13cja

Any Questions? Charles J. Antonelli LSAIT Advocacy and Research Support /13cja