Presentation on theme: "CSIRO ESG WORKSHOP September 20/21 2011 Ben Evans Joseph Antony Muhammad Atif Margaret Kahn."— Presentation transcript:
CSIRO ESG WORKSHOP September 20/21 2011 Ben Evans Joseph Antony Muhammad Atif Margaret Kahn
Accessing the dcc Logging on the dcc ssh -Y -l aaa777 dcc.nci.org.au use putty from Windows use VNC for X forwarding Web page http://nf.nci.org.au User Guide http://nf.nci.org.au/facilities/userguide FAQ http://nf.nci.org.au/facilities/faq Software web page http://nf.nci.org.au/facilities/software firstname.lastname@example.org
Applying for accounts National Merit Scheme (MAS) Partner allocations Startup allocation Flagship projects Distribution of compute allocation across VU and XE: MAS 29% (Flagship projects 10%) ANU 24% CSIRO 24% Director’s share 5% INTERSECT 4% Monash E-Research.5% Geoscience Australia.4% iVec.4% QCIF.4%
Project code To access the ESG data you need to be in the group ua6. This has no compute time. For computation you will need to be connected to a compute grant, e.g. r87 r87 is a CSIRO grant If you are new to the NCI NF first fill out the registration form at http://nf.nci.org.au Then fill out the form to connect to an existing project. This has to be approved by the Lead Chief Investigator of the project. You will be notified by email when your account is set up and the password sent as an SMS if you have provided a mobile number.
Project accounting Time allocated per quarter. No transfer between quarters. Chose project for accounting when you login. Change default project in the.rashrc file. PROJECT environment variable. quotasu -P project -h displays the usage in the current quarter and some recent history. dcc usage being tracked but not included in overall usage yet. dcc will be charged at.7 SU (service unit).
Unix Environment The working environment under UNIX is controlled by shells (command-line interpreter). The shell interprets the commands the user types in and carries them out. Default is tcsh shell (also popular is bash) Shell can be changed by modifying.rashrc Shell commands can be grouped together into scripts The shell provides environment variables that can be accessed across all the processes initiated from the original shell e.g. login environment -.cshrc and.login (csh/tcsh) -.bashrc and.profile (sh/bash) tcsh syntax setenv VARIABLE value bash syntax export VARIABLE=value
Environment Modules Modules provide a great way to easily customize your shell environment, especially on the fly. The module command syntax is the same no matter which command shell you are using. Various modules are loaded into your environment at login to provide a workable environment. module list # To see the modules loaded module avail # To see the list of software for which environments have been set up via modules module load # To load the environment settings required by a software package module unload # To remove extras added to the environment for a previously loaded software package. This is extremely useful in situations where different package settings clash. Note: To automate environment customisation at login module load commands can be added to the.login (tcsh) or.profile (bash) files. However, BEWARE, different applications can have incompatible environment requirements so loading multiple application modules in your dot file is likely to lead to problems. We recommend that modules are loaded in scripts as needed at runtime and modules loaded in the dot files kept to a minimum.
Editors Several editors are available vi emacs nano nedit If you are not familiar with any of these you will find that nano has a simple interface. Just type nano.
Exercise: Getting started Logging on to the dcc - for example for course account aaa777. The project code is c23 ssh -Y dcc.nci.org.au -l aaa777 Remember to read the Message of the Day (MOTD) as you login. Commands to try: hostname # to see the node you are logged into quotasu # to see the current state of the project printenv # to look at your environment settings module list # to check which modules are loaded on login module avail # to see which software packages are installed and accessible in this way. Load the Intel compilers module load intel-fc intel-cc Look at the environment settings again to see how they've changed.
Filesystems The Filesystems section of the userguide has this table in greater detail: Filesystem Size Limit Backup Location Time Limit /home 1000MB default Yes Global No /short 80GB default per project No Global 120 days /jobfs 20GB per cpu default No Local to node Duration of job MDSS 20GB 2 copies External, access No using special commands Note that these limits can be changed on request if necessary. dcc also has /projects – for persistent data
Batch Queueing System Most work done as batch jobs (interactive process limits are small). Queueing system: distributes work evenly over the system ensures that jobs cannot impact each other (e.g. exhaust memory or other resources) provide equitable access to the system NF uses a modified version of OpenPBS Scheduling policy is discussed in the User Guide.
Batch Queue Structure normal Default queue designed for production use Charging rate of 1 SU per processor-hour (walltime) Largest allowed resources If your grant is exhausted you still get access at a lower priority express High priority for testing, debugging etc. Charging rate of 3 SUs per processor-hour (walltime) Smaller limits to discourage "production use" by projects with too much grant left copyq Used for file manipulation - e.g. copying files to MDSS Only queue to run on the file server node for /short Job charging is based on wall clock time used, number of cpus requested, queue choice and machine choice. One hour of time on the DCC is worth 70% of one hour on the VU.
Using the Queueing System Read the "PBS Batch Use" and "Queues and Scheduling" sections of the Userguide See: nf_limits Request resources for your job (using qsub). See man pbs_resources: walltime (v)memory disk (jobfs) number of cpus software PBS will then schedule the job when the resources become available prevent other jobs from infringing on the allocated resources if necessary delay starting job until software licence is available display progress of the jobs (nqstat) terminate the job when it exceeds its requested resources return stdout and stderr in batch output files
Scheduling and Job Suspension Jobs won't be started until sufficient resources are free. Resources allocated to a job are unavailable to other jobs. Jobs can be suspended to run parallel jobs but the fraction of time suspended is limited (depends on how many jobs you have running, number of cpus, etc.) Only ask for the resources your job really needs! Avoids your job being delayed in the queue or suspended unnecessarily Avoids other users jobs being delayed unnecessarily by wasted resources Experiment in express and look at the bottom of the PBS stdout file to see what resources were used by jobs
Stdout and stderr files PBS queueing system returns the standard output and standard error arising from the script in.o and.e files respectively..o file contains the output arising from the script (if not redirected in the script) and additional information from PBS. cat batchscript.o70450 Warning: no access to tty (Bad file descriptor). Thus no job control in this shell. ================================================================= Resource usage: CPU time: 00:00:04 JobId: 70450.vu-pbs Elapsed time: 00:00:06 Project: c23 Requested time: 00:10:00 Service Units: 0.01 Max physical memory: 7MB Max virtual memory: 8MB Requested memory: 50MB Max jobfs disk use: 0.0GB Requested jobfs: 0.1GB ================================================================
Stdout and stderr.e file contains any error output arising from the script (if not redirected in the script) and additional information from PBS. For a successful job it should be empty. Common errors to look for in the.e file: Command not found. =>> PBS: job terminated: walltime 172818sec exceeded limit 172800sec =>> PBS: job terminated: per node vmem 2227620kb exceeded limit 2097152kb Segmentation fault. man qsub man nf_limits man qdel nf_limits
Using the Mass Data Store MDSS is used for long term storage of large datasets. If you have numerous small files to archive - bundle into a tar file FIRST. Every project has a directory on the MDSS at /massdata/$PROJECT All members of the project group have read and write access to the top project directory. The mdss command can be used to "get" and "put" data between the interactive nodes of the vu or xe and the MDSS, as well as to list files and directories on the MDSS. netcp and netmv can be used from within batch jobs to Generate a batch script for copying/moving files to the MDSS Submit the generated batch script to the special copyq which runs copy/move job on an interactive node. netcp and netmv can also be used interactively to save you work creating tar files and generating mdss commands.
Compilers Both GNU and Intel compilers are available. Note that the Intel C/C++ compiler is compatible with gcc/g++. module list module avail intel-fc module avail intel-cc module avail gcc which gcc Note that there is /usr/bin/gcc but more recent versions of gcc available if you load the relevant module. In general Python applications build best with gcc so you would do echo $CC module rm intel-fc intel-cc echo $CC
NCL Example NCL example cp -r /short/c23/NCL_EXAMPLE. cd NCL_EXAMPLE ls more evans_2.ncl more batchjob qsub batchjob nqstat more batchjob.o**** more batchjob.e***** ghostscript evans.ps
ESG data Example ESG data example cd cp -r /short/c23/ESG_EXAMPLES. cd ESG_EXAMPLES ls more fig1.ncl more fig2.ncl more batchjob qsub batchjob nqstat qps ****** more batchjob.o**** more batchjob.e***** ghostscript fig1.ps ghostscript fig2.ps
ESG data Example [mhk900@dcc ~]$ more /short/c23/ESG_EXAMPLES/fig1.ncl load "all_scripts.ncl" begin tf="/projects/ESG/authoritative/IPCC/CMIP5/CSIRO-QCCCE/CSIRO-Mk3-6-0/piControl/mon /atmos/Amon/r1i1p1/v20110518/tas/tas_Amon_CSIRO-Mk3-6-0_piControl_r1i1p1_00010 1-050012.nc" tfh=addfile(tf,"r") t=tfh->tas-273.15 lat=tfh->lat rad = 4.0*atan(1.0)/180.0 clat = doubletofloat(cos(lat*rad)) clat!0 = "lat" clat&lat = lat ……….