1 High-Performance Grid Computing and Research Networking Presented by Javier Delgodo Slides prepared by David Villegas Instructor: S. Masoud Sadjadi

Slides:



Advertisements
Similar presentations
Running DiFX with SGE/OGE Helge Rottmann Max-Planck-Institut für Radioastronomie Bonn, Germany DiFX Meeting Sydney.
Advertisements

Using the Argo Cluster Paul Sexton CS 566 February 6, 2006.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Southgreen HPC system Concepts Cluster : compute farm i.e. a collection of compute servers that can be shared and accessed through a single “portal”
Software Tools Using PBS. Software tools Portland compilers pgf77 pgf90 pghpf pgcc pgCC Portland debugger GNU compilers g77 gcc Intel ifort icc.
Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.
DCC/FCUP Grid Computing 1 Resource Management Systems.
Introducing the Command Line CMSC 121 Introduction to UNIX Much of the material in these slides was taken from Dan Hood’s CMSC 121 Lecture Notes.
6/2/20071 Grid Computing Sun Grid Engine (SGE) Manoj Katwal.
Sun Grid Engine Grid Computing Assignment – Fall 2005 James Ruff Senior Department of Mathematics and Computer Science Western Carolina University.
Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes.
Asynchronous Solution Appendix Eleven. Training Manual Asynchronous Solution August 26, 2005 Inventory # A11-2 Chapter Overview In this chapter,
Reproducible Environment for Scientific Applications (Lab session) Tak-Lon (Stephen) Wu.
Introduction to UNIX/Linux Exercises Dan Stanzione.
Thomas Finnern Evaluation of a new Grid Engine Monitoring and Reporting Setup.
Assignment 3: A Team-based and Integrated Term Paper and Project Semester 1, 2012.
December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Configuring Resources for the Grid Jerry Perez.
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
Sun Grid Engine. Grids Grids are collections of resources made available to customers. Compute grids make cycles available to customers from an access.
LOGO Scheduling system for distributed MPD data processing Gertsenberger K. V. Joint Institute for Nuclear Research, Dubna.
VIPBG LINUX CLUSTER By Helen Wang March 29th, 2013.
Bigben Pittsburgh Supercomputing Center J. Ray Scott
March 3rd, 2006 Chen Peng, Lilly System Biology1 Cluster and SGE.
Clusters at IIT KANPUR - 1 Brajesh Pande Computer Centre IIT Kanpur.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.
Parallel Computing with Matlab CBI Lab Parallel Computing Toolbox TM An Introduction Oct. 27, 2011 By: CBI Development Team.
6 th Annual Focus Users’ Conference Manage Integrations Presented by: Mike Morris.
Rochester Institute of Technology Job Submission Andrew Pangborn & Myles Maxfield 10/19/2015Service Oriented Cyberinfrastructure Lab,
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
HPC for Statistics Grad Students. A Cluster Not just a bunch of computers Linked CPUs managed by queuing software – Cluster – Node – CPU.
APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host.
1 High-Performance Grid Computing and Research Networking Presented by David Villegas Instructor: S. Masoud Sadjadi
How to for compiling and running MPI Programs. Prepared by Kiriti Venkat.
Software Tools Using PBS. Software tools Portland compilers pgf77 pgf90 pghpf pgcc pgCC Portland debugger GNU compilers g77 gcc Intel ifort icc.
Cluster Computing Applications for Bioinformatics Thurs., Sept. 20, 2007 process management shell scripting Sun Grid Engine running parallel programs.
Running Parallel Jobs Cray XE6 Workshop February 7, 2011 David Turner NERSC User Services Group.
Timeshared Parallel Machines Need resource management Need resource management Shrink and expand individual jobs to available sets of processors Shrink.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Portable Batch System – Definition and 3 Primary Roles Definition: PBS is a distributed workload management system. It handles the management and monitoring.
Cliff Addison University of Liverpool NW-GRID Training Event 26 th January 2007 SCore MPI Taking full advantage of GigE.
CS 283Computer Networks Spring 2013 Instructor: Yuan Xue.
Wouter Verkerke, NIKHEF 1 Using ‘stoomboot’ for NIKHEF-ATLAS batch computing What is ‘stoomboot’ – Hardware –16 machines, each 2x quad-core Pentium = 128.
Using ROSSMANN to Run GOSET Studies Omar Laldin ( using materials from Jonathan Crider, Harish Suryanarayana ) Feb. 3, 2014.
Grid Computing: An Overview and Tutorial Kenny Daily BIT Presentation 22/09/2016.
An Brief Introduction Charlie Taylor Associate Director, Research Computing UF Research Computing.
Advanced Computing Facility Introduction
Welcome to Indiana University Clusters
PARADOX Cluster job management
Open OnDemand: Open Source General Purpose HPC Portal
OpenPBS – Distributed Workload Management System
Welcome to Indiana University Clusters
Using Paraguin to Create Parallel Programs
GWE Core Grid Wizard Enterprise (
Chapter 2: System Structures
Architecture & System Overview
CommLab PC Cluster (Ubuntu OS version)
BIMSB Bioinformatics Coordination
Deploying and Configuring SSIS Packages
CRESCO Project: Salvatore Raia
Paul Sexton CS 566 February 6, 2006
Compiling and Job Submission
Privilege Separation in Condor
Requesting Resources on an HPC Facility
Sun Grid Engine.
High-Performance Grid Computing and Research Networking
Quick Tutorial on MPICH for NIC-Cluster
Working in The IITJ HPC System
Chapter 3: Process Management
Presentation transcript:

1 High-Performance Grid Computing and Research Networking Presented by Javier Delgodo Slides prepared by David Villegas Instructor: S. Masoud Sadjadi sadjadi At cs Dot fiu Dot edu How to Use the Cluster?

2 Acknowledgements The content of many of the slides in this lecture notes have been adopted from the online resources prepared previously by the people listed below. Many thanks!  Henri Casanova Principles of High Performance Computing

3 Is MPI enough? MPI submits the jobs using rsh/ssh There is no control of who runs what! For multiple users in the cluster, we want to have privileges, authentication, fair- share...

4 Introducing Batch Schedulers A job scheduler provides more features to control job execution:  Interfaces to define workflows and/or job dependencies  Automatic submission of executions  Interfaces to monitor the executions  Priorities and/or queues to control the execution order of unrelated jobs

5 Batch Schedulers Most production clusters are managed via a batch scheduler:  You ask the batch scheduler to give you X nodes for Y hours to run program Z  At some point, the program will be started.  Later on you can look at the program output This is really different from what you’re used to, and honestly is sort of painful  No interactive execution Necessary because:  Since most applications are in this for high performance, they’d better be alone on their compute nodes  There are not enough compute nodes for everybody at all times

6 Scheduling criteria Job priority Compute resource availability License key if job is using licensed software Execution time allocated to user Number of simultaneous jobs allowed for a user Estimated execution time Elapsed execution time Availability of peripheral devices Occurrence of prescribed events …

7 The case of GCB Rocks allows us to install different job schedulers: SGE, PBS, LSF, Condor… Currently we have SGE installed. Sun Grid Engine is an open source DRM (Distributed Resource Manager) sponsored by Sun Microsystems and CollabNet. It can be downloaded from

8 Our Cluster You have (or soon will get) an account on the cluster Question: once I am logged in, what do I do? Clusters are always organized as  A front end node To compile code (and do minimal testing)‏ To submit jobs  Compute nodes To run the code You don’t ssh to these directly In our case they are dual-proc Pentiums

9 How to use SGE as a user? You need to learn how to do three basic things  Check the status of the platform  Submit a job  Check on job status All can be done from the command line  Read the man pages  Google “SGE commands” Checking on platform and job status  qhost Information about nodes  qstat –f Information about queues  qstat –F [ resource ] Detailed information about resources  qstat lists pending/running/done jobs

10 How to use SGE as a user? (contd.)‏ Submitting and controlling jobs  qsub We can pass the path to a binary or a script qsub –b yes Submits a binary qsub –q queue list Specifies to what queue the job will be sent qsub –pe parallel-env n Allows to send a parallel job  qdel Attempts to terminate a range of jobs But for those of you who don’t like the command line…  qmon Be sure that you are forwarding X11 and that you have a X server in your client machine!

11 How to use SGE as a user? (contd.)‏ But sending a single command is not very interesting…  Submitting scripts Scripts can submit many jobs We can pass options to SGE and consult environment variables. Example:  #$ -cwdUse the currend directory as work directory  #$ -j yJoin errors and output in the same file  #$ -N get_dateGive a name to the job  #$ -o output.$JOB_ID Use a given file for output  $JOB_ID: The job number assigned by the scheduler to your job  $JOBDIR: The directory your job is currently running in  $USER: The username of the person currently running the job  $JOB_NAME: The job name specified by -N option  $QUEUE: Current running queue

12 How to use SGE as a user? (contd.)‏  Submitting parallel jobs We can define parallel-environments to execute this kind of jobs. Parallel environments define startup procedures, maximum number of slots, users allowed to submit parallel jobs… Examples: mpich, lam … SGE allows "Tight Integration" with MPICH by intercepting the calls MPICH makes to run your job on other machines, and replacing those calls with SGE calls so that it may better monitor and manage your parallel jobs. ( Source )‏ It is also possible to integrate other MPI flavors with SGE

13 How to use SGE as an admin? Scheduler configuration. This values are found in /opg/gridengine/default/common/sched_configuration These values can only be altered using qconf or qmon  algorithm default  schedule_interval 0:0:15  maxujobs 0  queue_sort_method load  job_load_adjustments np_load_avg=0.50  load_adjustment_decay_time 0:7:30  load_formula np_load_avg  schedd_job_info true  flush_submit_sec 0  flush_finish_sec 0  params none  reprioritize_interval 0:0:0  halftime 168  usage_weight_list cpu=1,mem=0,io=0  compensation_factor 5  …

14 How to use SGE as an admin? (contd.)‏ Queue configuration  Queues are created with qmon or qconf qconf –shgrpl show all host groups qconf -ahgrp group add a new host group qconf –shgrp group show details for one group qconf –sq queue shows a queue configuration qconf –Aq file create a queue from a file  We’ll output a queue configuration to a file and modify it.  Exercise: create a short/test job queue. Which are the best policies for this kind of queue?

15 Queue parameters  qname  hostlist  seq_no  load_thresholds  suspend_thresholds  nsuspend  suspend_interval  priority  min_cpu_interval  processors  qtype  ckpt_list  pe_list  rerun  slots  tmpdir  shell  … For the rest, type man queue_conf