Presentation is loading. Please wait.

Presentation is loading. Please wait.

Research Computing Environment at the University of Alberta Diego Novillo Research Computing Support Group University of Alberta April 1999.

Similar presentations


Presentation on theme: "Research Computing Environment at the University of Alberta Diego Novillo Research Computing Support Group University of Alberta April 1999."— Presentation transcript:

1 Research Computing Environment at the University of Alberta Diego Novillo Research Computing Support Group University of Alberta April 1999

2 29 April 19992 Computing Environment SGI Origin 2000, 42 CPUs, 10Gb RAM Mix of interactive and batch jobs 2 CPUs for interactive activity 40 CPUs used by batch jobs Batch jobs managed by LSF (Platform)

3 How is the system being used?

4 29 April 19994 Monthly System Utilization (CPU days) Monthly System Utilization (CPU days) Theoretical max

5 29 April 19995 Average wait time in queue (hours) Average wait time in queue (hours) Started using load thresholds Need to balance parallel jobs

6 29 April 19996 System usage by job type

7 29 April 19997 Some thoughts on usage Scalar use is predominant (so far) We are starting to push the system Jobs are waiting too long in the queue Need to modify queue policies –Lower runtime limits –Checkpoint/restart –Limit on number of jobs submitted

8 Using LSF

9 29 April 19999 Job queues Parallel queue  par –High priority –Slot-based: up to 32 processors –Jobs are never suspended Sequential queue  nic –Low priority –threshold-based: up to 95% system utilization –Jobs can be preempted by parallel jobs

10 29 April 199910 Job queues II Two special queues –npseq For sequential jobs that do not wish to be preempted Very low priority Only 4 slots available –special Jobs that need to run longer than system limit Only 1 slot available Must be approved by committee

11 29 April 199911 Fairshare system Jobs are scheduled according to priorities Priorities are dynamic and based on –Number of shares –Past usage (currently 2 weeks of history) –Type of job (parallel jobs higher priority) Resource availability also important

12 29 April 199912 Getting started Complete LSF documentation online http://www.ualberta.ca/CNS/RESEARCH/LSF/ Man pages also available Add one line to your login files source /usr/local/lsf/cshrc.lsf ( C shell ) or. /usr/local/lsf/profile.lsf ( Bourne shell )

13 29 April 199913 Submitting jobs % bsub [options] pgm args -q name Which queue to use -n num How many processors -o file Output file Queue defaults to ‘nic’. If no output file is given, results are mailed to you.

14 29 April 199914 Watching jobs % bjobs [options] -l All the details -p Only pending jobs (and why) -a All jobs (even finished ones) -uall All the jobs in the system jobid Just the job with this id

15 29 April 199915 Manipulating jobs % bkill jobid Kills the job (can also send signal) % bstop jobid Suspends the job (even if not running) % bresume jobid Resumes the job

16 29 April 199916 Getting usage statistics We keep monthly stats in our web page http://www.ualberta.ca/CNS/RESEARCH/ For current information % bacct [opts] Total usage for your jobs. Can specify dates and jobs % priorities (or bhpart -r ) Lists all the priorities for different groups

17 29 April 199917 Monitoring load on the system % bqueues Shows queues and how loaded they are % lsload Quick glance at the load on the system Also GUI tools ( xlsbatch, xlsmon ) Please use sparingly as they add to interactive load on the system.

18 29 April 199918 Contact Information Visit our home page http://www.ualberta.ca/CNS/RESEARCH/ Questions and comments Research.Support@Ualberta.CA


Download ppt "Research Computing Environment at the University of Alberta Diego Novillo Research Computing Support Group University of Alberta April 1999."

Similar presentations


Ads by Google