PARADOX Cluster job management

Slides:



Advertisements
Similar presentations
Using the Argo Cluster Paul Sexton CS 566 February 6, 2006.
Advertisements

Job Submission Using PBSPro and Globus Job Commands.
Koç University High Performance Computing Labs Hattusas & Gordion.
Network for Computational Nanotechnology (NCN) Purdue, Norfolk State, Northwestern, UC Berkeley, Univ. of Illinois, UTEP Basic Portable Batch System (PBS)
Southgreen HPC system Concepts Cluster : compute farm i.e. a collection of compute servers that can be shared and accessed through a single “portal”
Software Tools Using PBS. Software tools Portland compilers pgf77 pgf90 pghpf pgcc pgCC Portland debugger GNU compilers g77 gcc Intel ifort icc.
Batch Queuing Systems The Portable Batch System (PBS) and the Load Sharing Facility (LSF) queuing systems share much common functionality in running batch.
Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.
VIPBG LINUX CLUSTER By Helen Wang Sept. 10, 2014.
Tutorial on MPI Experimental Environment for ECE5610/CSC
IT MANAGEMENT OF FME, 21 ST JULY  THE HPC FACILITY  USING PUTTY AND WINSCP TO ACCESS THE SERVER  SENDING FILES TO THE SERVER  RUNNING JOBS 
ISG We build general capability Job Submission on the Olympus Cluster J. DePasse; S. Brown, PhD; T. Maiden Pittsburgh Supercomputing Center Public Health.
Introduction to Running CFX on U2  Introduction to the U2 Cluster  Getting Help  Hardware Resources  Software Resources  Computing Environment  Data.
Job Submission on WestGrid Feb on Access Grid.
Sun Grid Engine Grid Computing Assignment – Fall 2005 James Ruff Senior Department of Mathematics and Computer Science Western Carolina University.
Using the P4-Xeon cluster HPCCC, Science Faculty, HKBU Usage Seminar for the 64-nodes P4-Xeon Cluster in Science Faculty March 24, 2004.
Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes.
Understanding the Basics of Computational Informatics Summer School, Hungary, Szeged Methos L. Müller.
ISG We build general capability Purpose After this tutorial, you should: Be comfortable submitting work to the batch queuing system of olympus and be familiar.
Introduction to UNIX/Linux Exercises Dan Stanzione.
BIOSTAT LINUX CLUSTER By Helen Wang October 10, 2013.
Research Computing with Newton Gerald Ragghianti Newton HPC workshop Sept. 3, 2010.
Zeus Users’ Quickstart Training January 27/30, 2012.
Electronic Visualization Laboratory, University of Illinois at Chicago MPI on Argo-new Venkatram Vishwanath Electronic Visualization.
BIOSTAT LINUX CLUSTER By Helen Wang October 11, 2012.
VIPBG LINUX CLUSTER By Helen Wang March 29th, 2013.
Using The Cluster. What We’ll Be Doing Add users Run Linpack Compile code Compute Node Management.
Bigben Pittsburgh Supercomputing Center J. Ray Scott
Introduction to Using SLURM on Discover Chongxun (Doris) Pan September 24, 2013.
Using the BYU Supercomputers. Resources Basic Usage After your account is activated: – ssh You will be logged in to an interactive.
HPC for Statistics Grad Students. A Cluster Not just a bunch of computers Linked CPUs managed by queuing software – Cluster – Node – CPU.
Getting Started on Emerald Research Computing Group.
Software Tools Using PBS. Software tools Portland compilers pgf77 pgf90 pghpf pgcc pgCC Portland debugger GNU compilers g77 gcc Intel ifort icc.
Running Parallel Jobs Cray XE6 Workshop February 7, 2011 David Turner NERSC User Services Group.
Portable Batch System – Definition and 3 Primary Roles Definition: PBS is a distributed workload management system. It handles the management and monitoring.
Advanced topics Cluster Training Center for Simulation and Modeling September 4, 2015.
Getting Started: XSEDE Comet Shahzeb Siddiqui - Software Systems Engineer Office: 222A Computer Building Institute of CyberScience May.
NREL is a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, operated by the Alliance for Sustainable.
Using ROSSMANN to Run GOSET Studies Omar Laldin ( using materials from Jonathan Crider, Harish Suryanarayana ) Feb. 3, 2014.
An Brief Introduction Charlie Taylor Associate Director, Research Computing UF Research Computing.
Introduction to the Linux Command Line Interface Research Computing Systems Bob Torgerson July 19, 2016.
1 High-Performance Grid Computing and Research Networking Presented by Javier Delgodo Slides prepared by David Villegas Instructor: S. Masoud Sadjadi
BIOSTAT LINUX CLUSTER By Helen Wang October 6, 2016.
Advanced Computing Facility Introduction
Hackinars in Bioinformatics
Hands on training session for core skills
GRID COMPUTING.
Specialized Computing Cluster An Introduction
INTRODUCTION TO VIPBG LINUX CLUSTER
Unix Scripts and PBS on BioU
HPC usage and software packages
INTRODUCTION TO VIPBG LINUX CLUSTER
OpenPBS – Distributed Workload Management System
MPI Basics.
How to use the HPCC to do stuff
BIOSTAT LINUX CLUSTER By Helen Wang October 29, 2015.
Computational Physics (Lecture 17)
Architecture & System Overview
CommLab PC Cluster (Ubuntu OS version)
Postdoctoral researcher Department of Environmental Sciences, LSU
Welcome to our Nuclear Physics Computing System
Paul Sexton CS 566 February 6, 2006
Introduction to TAMNUN server and basics of PBS usage
Compiling and Job Submission
HOPPER CLUSTER NEW USER ORIENTATION February 2018.
Requesting Resources on an HPC Facility
gLite Job Management Christos Theodosiou
Introduction to High Performance Computing Using Sapelo2 at GACRC
Quick Tutorial on MPICH for NIC-Cluster
Working in The IITJ HPC System
Presentation transcript:

PARADOX Cluster job management Nikola Grkic Institute of Physics Belgrade, Serbia ngrkic@ipb.ac.rs The HP-SEE initiative is co-funded by the European Commission under the FP7 Research Infrastructures contract no. 261499

PARADOXical Training, 14 October 2011, Belgrade, Serbia 2 Overview Access to login node Preparing job submitting scripts Submitting jobs: Serial jobs OpenMP jobs MPI jobs Hybrid jobs Application specific jobs (NAMD) Batch system job control and monitoring PARADOXical Training, 14 October 2011, Belgrade, Serbia 2

PARADOXical Training, 14 October 2011, Belgrade, Serbia 3 Access to login node PARADOX login node : ui.ipb.ac.rs Access via ssh (remote login program) $ ssh username@ui.ipb.ac.rs If you need a graphical environment you have to use the -X option: $ ssh username@ui.ipb.ac.rs -X PARADOXical Training, 14 October 2011, Belgrade, Serbia 3

PARADOXical Training, 14 October 2011, Belgrade, Serbia 4 Access to login node File transfer To transfer files between ui.ipb.ac.rs and your local machine, you can use the scp command. Create an archive with the directories you want to copy (it will be faster to transfer): $ tar -cvzf archivename.tgz directoryname1 directoryname2 or in the case of a file: $ tar -cvzf archivename.tgz filename Transfer the archive to your home directory at ui.ipb.ac.rs: $ scp archivename.tgz username@ui.ipb.ac.rs: Login on ui.ipb.ac.rs: $ ssh username@ipb.ac.rs Uncompress the archive in your target directory: $ tar -xvzf archivename.tgz destinationdirectory PARADOXical Training, 14 October 2011, Belgrade, Serbia 4

PARADOXical Training, 14 October 2011, Belgrade, Serbia 5 Access to login node There are two file systems available to users at ui.ipb.ac.rs: /home /nfs File systems have directories like /home/<USERNAME> or /nfs/<USERNAME> /nfs is shared between all nodes on the cluster It is required to put all executables and data used by jobs in this directory. There is another local file system available on each worker node: /scratch which is used for temporary storage of running jobs PARADOXical Training, 14 October 2011, Belgrade, Serbia 5

Preparing job submitting scripts Portable batch system Job submissions Resources allocations Jobs launching PBS server ce64.ipb.ac.rs command qsub (ui.ipb.ac.rs) Queue hpsee is available for user’s job submission To submit a job, you first have to write a shell script which contains: A set of directives (beginning with #PBS) which describe needed resources for your job Lines necessary to execute your code PARADOXical Training, 14 October 2011, Belgrade, Serbia 6

Preparing job submitting scripts PBS script #!/bin/bash #PBS -q hpsee #PBS -l nodes=1:ppn=1 #PBS -l walltime=00:10:00 #PBS -e ${PBS_JOBID}.err #PBS -o ${PBS_JOBID}.out cd $PBS_O_WORKDIR chmod +x job.sh ./job.sh Job script #!/bin/bash date hostname pwd sleep 10 PARADOXical Training, 14 October 2011, Belgrade, Serbia 7

Preparing job submitting scripts #!/bin/bash - Specifies the shell to be used when executing the command portion of the script. #PBS -q <queue> - Directs the job to the specified queue. Queue hpsee should be used. #PBS -o <name> - Writes standard output to <name>. $PBS_JOBID is an environment variable created by PBS that contains the PBS job identifier. #PBS -e <name> - Writes standard error to <name>. PARADOXical Training, 14 October 2011, Belgrade, Serbia 8

Preparing job submitting scripts #PBS -l walltime=<time> - Maximum wall-clock time. <time> is in the format HH:MM:SS. #PBS -l nodes=1:ppn=1 – Number of nodes to be reserved for exclusive use by the job and number of virtual processors per node (ppn) requested for this job. cd $PBS_O_WORKDIR - Change to the initial working directory chmod +x job.sh - setting execute permission on job.sh file ./job.sh – execute job PARADOXical Training, 14 October 2011, Belgrade, Serbia 9

Preparing job submitting scripts Job submiting is done by executing qsub in /nfs/username/somefolder $ qsub job.pbs The qsub command will return: <JOB_ID>.ce64.ipb.ac.rs PARADOXical Training, 14 October 2011, Belgrade, Serbia 10

PARADOXical Training, 14 October 2011, Belgrade, Serbia 11 Submitting jobs Serial jobs http://wiki.ipb.ac.rs/index.php/Serial_job_example #!/bin/bash #PBS -q hpsee #PBS -l nodes=1:ppn=1 #PBS -l walltime=00:10:00 #PBS -e ${PBS_JOBID}.err #PBS -o ${PBS_JOBID}.out cd $PBS_O_WORKDIR chmod +x job.sh ./job.sh PARADOXical Training, 14 October 2011, Belgrade, Serbia 11

PARADOXical Training, 14 October 2011, Belgrade, Serbia 12 Submitting jobs OpenMP http://wiki.ipb.ac.rs/index.php/OpenMP_job_example #!/bin/bash #PBS -q hpsee #PBS -l nodes=1:ppn=6 #PBS -l walltime=10:00:00 #PBS -e ${PBS_JOBID}.err #PBS -o ${PBS_JOBID}.out cd $PBS_O_WORKDIR chmod +x job export OMP_NUM_THREADS=6 ./job PARADOXical Training, 14 October 2011, Belgrade, Serbia 12

PARADOXical Training, 14 October 2011, Belgrade, Serbia 13 Submitting jobs MPI jobs http://wiki.ipb.ac.rs/index.php/MPI_job_examples #!/bin/bash #PBS -q hpsee #PBS -l nodes=3:ppn=2 #PBS -l walltime=10:00:00 #PBS -e ${PBS_JOBID}.err #PBS -o ${PBS_JOBID}.out cd $PBS_O_WORKDIR chmod +x job cat $PBS_NODEFILE ${MPI_MPICH_MPIEXEC} ./job # If mpich-1.2.7p1 is used #${MPI_MPICH2_MPIEXEC} --comm=pmi ./job # If mpich2-1.1.1p1 is used #${MPI_OPENMPI_MPIEXEC} ./job # If openmpi-1.2.5 is used PARADOXical Training, 14 October 2011, Belgrade, Serbia 13

PARADOXical Training, 14 October 2011, Belgrade, Serbia 14 Submitting jobs Hybrid jobs: http://wiki.ipb.ac.rs/index.php/Hybrid_job_example #!/bin/bash #PBS -q hpsee #PBS -l nodes=4:ppn=8 #PBS -l walltime=10:00:00 #PBS -e ${PBS_JOBID}.err #PBS -o ${PBS_JOBID}.out export OMP_NUM_THREADS=8 cd $PBS_O_WORKDIR chmod +x job ${MPI_OPENMPI_MPIEXEC} -np 4 -npernode 1 ./job PARADOXical Training, 14 October 2011, Belgrade, Serbia 14

PARADOXical Training, 14 October 2011, Belgrade, Serbia 15 Submitting jobs Application specific jobs (NAMD): http://wiki.ipb.ac.rs/index.php/Application_specific_job_example_(NAMD) #!/bin/bash #PBS -q hpsee #PBS -l nodes=4:ppn=8 #PBS -l walltime=10:00:00 #PBS -e ${PBS_JOBID}.err #PBS -o ${PBS_JOBID}.out cd $PBS_O_WORKDIR chmod +x script.sh ./script.sh PARADOXical Training, 14 October 2011, Belgrade, Serbia 15

Batch system job control and monitoring To check the status of job use the following command: $ qstat <ЈОВ_ID> Alternatively check the status of all your jobs using the following syntax of the qstat command: $ qstat -u <user_name> To get detailed information about your job use the following command: $ qstat -f <JOB_ID> If, for some reason, you want to cancel a job following command should be executed: $ qdel <JOB_ID> PARADOXical Training, 14 October 2011, Belgrade, Serbia 16

Batch system job control and monitoring qstat list information about queues and jobs qstat -q list all queues on system qstat -Q list queue limits for all queues qstat -a list all jobs on system qstat –au userID list all jobs owned by user userID qstat -s list all jobs with status comments qstat -r list all running jobs PARADOXical Training, 14 October 2011, Belgrade, Serbia 17

Batch system job control and monitoring qstat- f jobID list all information known about specified job qstat -n in addition to the basic information, nodes allocated to a job are listed qstat –Qf <queue> list all information about specified queue qstat -B list summary information about the PBS server qdel jobID delete the batch job with jobID qalter alter a batch job qsub submit a job PARADOXical Training, 14 October 2011, Belgrade, Serbia 18

PARADOXical Training, 14 October 2011, Belgrade, Serbia 19 Questions ? PARADOXical Training, 14 October 2011, Belgrade, Serbia 19