Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Flux (Statistics)

Similar presentations


Presentation on theme: "Introduction to Flux (Statistics)"— Presentation transcript:

1 Introduction to Flux (Statistics)
Dr. Charles Antonelli Mark Champe LSAIT ARS August, 2016

2 Roadmap Introduction to Flux The command line Flux mechanics cja 2016
8/16

3 Introduction to Flux cja 2014 10/14

4 Flux Flux is a university-wide shared computational discovery / high-performance computing service. Provided by Advanced Research Computing at U-M Procurement, licensing, billing by U-M ITS Interdisciplinary since 2010 ARC: Advanced Research Computing (neé ORCI) cja 2016 8/16

5 The Flux cluster … Login nodes Compute nodes Data transfer node
Storage cja 2016 8/16

6 A Standard Flux node Network 48-128 GB RAM 12-24 Intel cores
4 GB/core GB RAM 12-24 Intel cores Local disk As of 6/6/16: About 16,000 shared Flux cores (incl 12,000 Standard Flux), 6,000 unshared, 22,000 total. 630 standard memory Flux nodes with 12,16,20,24 cores and 48,64,96,128 GB RAM. 14 Higher Memory Flux nodes with 32,40,56 cores and 1,1,1.5 TB RAM. 5 GPU Flux nodes with 8 NVIDIA K20X GPU cards in each, each GPU card has 2,688 CUDA cores 6 GPU Flux nodes with 4 NVIDIA K40X GPU cards in each, each GPU card has 2,880 CUDA cores Network cja 2016 8/16

7 Flux storage 80 GB of storage per user /home/uniqname
Not backed up /home/uniqname NFS filesystem mounted on /home on all nodes Small, slow, long-term For development & testing cja 2016 8/16

8 Flux scratch 1.5 PB of high speed temporary storage Not backed up
/scratch/alloc_name/uniqname Lustre filesystem mounted on /scratch on all nodes Big, fast, short-term Files stored in /scratch will be deleted when they have not been accessed in 90 days Moving data to/from /scratch < ~100 GB: scp, sftp, WinSCP > ~100 GB: Globus Online cja 2016 8/16

9 Other Flux services Higher-Memory Flux GPU Flux Flux on Demand
14 nodes: 32/40/56-core, TB GPU Flux 5 nodes: Standard Flux, plus 8 NVIDIA K20X GPUs ,2,688 GPU cores each 6 nodes: Standard Flux, plus 4 NVIDIA K40X GPUs, 2,880 GPU cores each/ Flux on Demand Pay only for CPU wallclock consumed, at a higher cost rate You do pay for cores and memory requested Flux Operating Environment Purchase your own Flux hardware, via research grant Updated 6 Jun 2016: For Flux On Demand the researcher is charged for processor-equivalents requested * wallclock consumed, where processor-equivalents = max( core-processor-equivalents, memory-processor-equivalents ) Examples: 1. If a researcher requests walltime=4:00:00 but the job ends after only 1 hour, they pay for only 1 hour rather than the 4 hours requested. This is because one or more other jobs can be scheduled on the node(s) at that point. 2. If a researcher requests nodes=1:ppn=4 but utilizes only a single core (the other three cores are idle the entire time), they are still charged for 4 cores since they reserved the 4 cores so that no other researcher could use them. Note that 1 core == 1 processor-equivalent. 3. If a researcher requests mem=16000mb but the job only actually utilizes 4000mb, they are still charged for 4 processor-equivalents rather than 1 processor-equivalent since they reserved the 16,000 MB so that no other researcher could use it. Currently, for Standard Flux, 4096 mb == 1 processor-equivalent even on the nodes which have more than 4 GB per core. This means that a job which requests all 128 GB on a Haswell node is charged for 32 processor-equivalents even though the job will have somewhere between 1 and 24 cores available to it (depending on the number of cores the researcher requested). cja 2016 8/16

10 The command line cja 2014 10/14

11 Command Line Reference
William E Shotts, Jr., “The Linux Command Line: A Complete Introduction,” No Starch Press, January Download Creative Commons Licensed version athttp://downloads.sourceforge.net/project/linuxcommand/TLCL/13.07/TLCL pdf . cja 2016 8/16

12 The command line A basic way of interacting with a Linux system
Execute commands Create files and directories Edit file content Access the web Copy files to and from other hosts Run HPC jobs … do things you can’t do from the conventional point- and-click Graphical User Interface (GUI) cja 2014 10/14

13 Why command line? Linux was designed for the command line
You can create new Linux commands using the command line, without programming Many systems provide only the command line, or poorly support a GUI interface Such as many HPC systems Many things can be accomplished only through the command line Such as systems administration & troubleshooting You want to be cool cja 2014 10/14

14 The command shell The command shell is an application that reads command lines from the keyboard and passes them to the Linux operating system to be executed On your desktop, laptop, or tablet, you may have to find and execute a terminal emulator application to bring up a local shell in a window When you login to a remote Linux system using a tool like ssh, you will automatically be connected to a remote shell cja 2014 10/14

15 Connecting via ssh Terminal emulators Linux and Mac OS X
Start Terminal Use ssh command Windows U-M PuTTY/WinSCP (U-M Blue Disc) PuTTY SSH Secure Shell cja 2014 10/14

16 Lab Task: Start a local shell on Mac OS X
If there is a Terminal icon in the Dock , double-click it Otherwise: Bring up a Finder window: Click on the desktop, type Command-N Start the Terminal Application: In the Finder window, click on Applications on the left, scroll down on the right until you find the Utilities folder, double-click on the Utilities folder, scroll down on the right until you find the Terminal application, double-click it This creates a Terminal window on Flux with a shell running inside it From a Terminal window, Command-N will start a new Terminal cja 2014 10/14

17 Lab Task: Start a local shell on Windows
Get the PuTTY/WinSCP installer for your flavor of Windows: U-M Blue Disc: Execute the installer (UM_PuTTY_WinSCP.exe) Accept all defaults Double-click "UM Internet Access Kit" icon on Desktop Double-click PuTTY application within In the Putty Application: that appears: Enter "flux-login.arc-ts.umich.edu" in the Host Name box Under Connection | SSH | X11, ensure Enable X11 Forwarding is checked Click "Open" at bottom In the terminal window that appears: Login with uniqname, Kerberos password, and Duo This creates a Terminal window on Flux with a shell running inside it From a Terminal window, "New Session…" under the terminal icon at upper left will start a new Terminal cja 2014 10/14

18 Lab cja 2014 10/14

19 The shell prompt The “~$” is the shell prompt
This means the shell is waiting for you to type something Format can vary, usually ends with “$” , “%” or “#” If $ or %, you have a normal shell This shell has your privileges If #, you have a so-called “root shell” This shell has administrator privileges You can do a great deal of irreversible damage cja 2014 10/14

20 Typing at the shell prompt
Basic input line editing commands Backspace erases previous character Left and right arrow move insertion point on the line Control-U erases the line to the insertion point, so you can start over Enter executes the line you typed Control-C interrupts whatever command you started and returns you to the shell prompt (usually) Up and down arrow will access your command history Type “exit” without the quotes to exit the shell cja 2014 10/14

21 Lab Task: Enter some basic commands ~$ date ~$ id ~$ df -kh ~$ who ~$ top # type Control-C to exit Terse, aren’t they? cja 2014 10/14

22 Flux mechanics cja 2014 10/14

23 Using Flux Three basic requirements: A Flux login account A Flux allocation hpc201_flux, hpc201_fluxg A Duo app on your smartphone or tablet Logging in to Flux ssh -X Campus wired or MWireless VPN ssh login.itd.umich.edu first Duo 2FA replaced MTokens on July 20, 2016 cja 2016 8/16

24 Cluster batch workflow
You create a batch script and submit it to PBS PBS schedules your job, and it enters the flux queue When its turn arrives, your job will execute the batch script Your script has access to all Flux applications and data When your script completes, anything it sent to standard output and error are saved in files stored in your submission directory You can ask that be sent to you when your jobs starts, ends, or aborts You can check on the status of your job at any time, or delete it if it's not doing what you want A short time after your job completes, it disappears from PBS cja 2016 8/16

25 Cluster Parallel Programming Models
Multi-threaded The application consists of a single process containing several parallel threads that communicate with each other using synchronization primitives Used when the data can fit into a single process "Fine-grained parallelism" or "shared-memory parallelism" Implemented using OpenMP (Open Multi-Processing) compilers and libraries Examples: MATLAB, R parallel library (multicore), … Message-passing The application consists of several processes running on different nodes and communicating with each other over the network Used when the data are too large to fit on a single node "Coarse parallelism" or "SPMD" Implemented using MPI (Message Passing Interface) libraries Examples: MATLAB parallel pools, … Both cja 2016 8/16

26 Tightly-coupled batch script
#PBS -N yourjobname #PBS -V #PBS -A youralloc_flux #PBS -l qos=flux #PBS -q flux #PBS -l nodes=1:ppn=12,mem=47gb,walltime=00:05:00 #PBS -M your address #PBS -m abe #PBS -j oe #Your Code Goes Below: cat $PBS_NODEFILE cd $PBS_O_WORKDIR matlab -nodisplay -r script cja 2016 8/16

27 Loosely-coupled batch script
#PBS -N yourjobname #PBS -V #PBS -A youralloc_flux #PBS -l qos=flux #PBS -q flux #PBS -l procs=12,pmem=1gb,walltime=00:05:00 #PBS -M your address #PBS -m abe #PBS -j oe #Your Code Goes Below: cat $PBS_NODEFILE cd $PBS_O_WORKDIR mpirun ./c_ex01 cja 2016 8/16

28 Copying data From Linux or Mac OS X, use scp or sftp or CyberDuck
Non-interactive (scp) scp localfile scp -r localdir scp localfile Use "." as destination to copy to your Flux home directory: scp localfile ... or to your Flux scratch directory: scp localfile Interactive (sftp or CyberDuck) sftp Cyberduck: From Windows, use WinSCP U-M Blue Disc: cja 2016 8/16

29 Globus Online Features Globus Endpoints Share folders via Globus+
High-speed data transfer, much faster than scp or WinSCP Reliable & persistent Minimal, polished client software: Mac OS X, Linux, Windows Globus Endpoints GridFTP Gateways through which data flow XSEDE, OSG, National labs, … Umich Flux: umich#flux Add your own server endpoint: contact Add your own client endpoint! Share folders via Globus+ cja 2016 8/16

30 Basic batch commands Once you have a script, submit it: qsub scriptfile $ qsub singlenode.pbs nyx.engin.umich.edu You can check on the job status: qstat jobid qstat -u user $ qstat -u cja nyx.engin.umich.edu: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time nyx.engi cja flux hpc101i :05 Q -- To delete your job qdel jobid $ qdel $ cja 2016 8/16

31 Flux software Licensed and open software: C, C++, Fortran compilers:
Abacus, BLAST, BWA, bowtie, ANSYS, Java, Mason, Mathematica, Matlab, R, RSEM, STATA SE, … See C, C++, Fortran compilers: GNU, Intel, PGI toolchains You can choose software using the module command cja 2016 8/16

32 Modules Enter these commands at any time during your session
The module command allows you to specify what versions of software you want to use module list -- Show loaded modules module load name -- Load module name for use module avail -- Show all available modules module avail name -- Show versions of module name module unload name -- Unload module name module -- List all options Enter these commands at any time during your session You can make default modules available at every login Load the modules you wish to be loaded by default module save cja 2016 8/16

33 Flux environment The Flux login nodes have the standard GNU/Linux toolkit: make, perl, python, java, emacs, vi, nano, … Watch out for source code or data files written on non-Linux systems Use these tools to analyze and convert source files to Linux format file dos2unix NOTE: do nano intro cja 2016 8/16

34 Lab Task: Use the R multicore package module load R
Copy sample code to your login directory cd cp /scratch/data/workshops/stats/stats-sample-code.tar.gz . tar -zxvf stats-sample-code.tar.gz cd ./stats-sample-code Examine lab3.pbs and lab3.R Edit lab3.pbs with your favorite Linux editor Change #PBS -M address to your own NOTE: module load should now be superfluous, but it doesn’t hurt cja 2016 8/16

35 Lab Task: Use the R multicore package
Submit your job to Flux qsub lab3.pbs Watch the progress of your job qstat -u uniqname where uniqname is your own uniqname When complete, look at the job’s output less lab3.out cja 2016 8/16

36 Interactive jobs You can submit jobs interactively:
qsub -I -X -V -l procs=2 -l walltime=15: A youralloc_flux -l qos=flux –q flux This queues a job as usual Your terminal session will be blocked until the job runs When your job runs, you'll get an interactive shell on one of your nodes Invoked commands will have access to all of your nodes When you exit the shell your job is deleted Interactive jobs allow you to Develop and test on cluster node(s) without repeated queuing delays Execute GUI tools on a cluster node Utilize a parallel debugger interactively cja 2016 8/16

37 Interactive jobs module load R
Start an interactive PBS session qsub -I -V -l procs=2 -l walltime=30:00 –A stats_flux -l qos=flux -q flux Run R in the interactive shell cd $PBS_O_WORKDIR R cja 2016 8/16

38 Troubleshooting module load flux-utils System-level Allocation-level
freenodes # aggregate node/core busy/free pbsnodes [-l] # nodes, states, properties # with -l, list only nodes marked down Allocation-level mdiag -a alloc # cores & users for allocation alloc showq [-r][-i][-b][-w acct=alloc] # running/idle/blocked jobs for alloc # with -r|i|b show more info for that job state freealloc [--jobs] alloc # free resources in allocation alloc # with –jobs User-level mdiag -u uniq # allocations for user uniq showq [-r][-i][-b][-w user=uniq] # running/idle/blocked jobs for uniq Job-level qstat -f jobno # full info for job jobno qstat -n jobno # show nodes/cores where jobno running checkjob [-v] jobno # show why jobno not running cja 2016 8/16

39 Any Questions? Charles J. Antonelli LSAIT ARS cja 2016 8/16


Download ppt "Introduction to Flux (Statistics)"

Similar presentations


Ads by Google