IBM Systems & Technology Group LoadLeveler 3.3 Dr. Roland Kunz, IT Specialist l.

Slides:



Advertisements
Similar presentations
© 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Job Submission.
Advertisements

CNT 4603: Managing/Maintaining Server 2008 – Part 3 Page 1 Dr. Mark Llewellyn © CNT 4603: System Administration Spring 2014 Managing And Maintaining Windows.
® IBM Software Group © 2010 IBM Corporation What’s New in Profiling & Code Coverage RAD V8 April 21, 2011 Kathy Chan
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Lesson 17: Configuring Security Policies
ALEPH version 21 Task Manager. New Task Manager Interface Admin tab 2 The Task Manager interface has been removed from the ALEPH menu, and is now found.
Installation and Deployment in Microsoft Dynamics CRM 4.0
Unauthorized Reproduction Prohibited SkyPoint Alarm Integration Add-On Using OnGuard Alarms to create events in SkyPoint Also called ‘SkyPoint V0’ CR4400.
Manage Run Activities Cognos 8 BI. Objectives  At the end of this course, you should be able to:  manage current, upcoming and past activities  manage.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 8: Implementing and Managing Printers.
1 of 7 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2007 Microsoft Corporation.
J. Skovira 5/05 v11 Introduction to IBM LoadLeveler Batch Scheduling System.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 8 Introduction to Printers in a Windows Server 2008 Network.
Condor Overview Bill Hoagland. Condor Workload management system for compute-intensive jobs Harnesses collection of dedicated or non-dedicated hardware.
Linux+ Guide to Linux Certification, Second Edition
SETUP AND CONFIGURATIONS WEBLOGIC SERVER. 1.Weblogic Installation 2.Creating domain through configuration wizard 3.Creating domain using existing template.
Using the Windows Event Viewer and Task Scheduler Chapter 5.
1 Chapter Overview Creating User and Computer Objects Maintaining User Accounts Creating User Profiles.
1 © 2001, Cisco Systems, Inc. All rights reserved. Voice Connector Features Voic Interoperability – 4.0(5) Voice Connector features Rahul Singh.
A walkthrough of the SageQuest Mobile Control Online & ESC integration.
Utilizing Condor and HTC to address archiving online courses at Clemson on a weekly basis Sam Hoover 1 Project Blackbird Computing,
© 2012 IBM Corporation Tivoli Workload Automation Informatica Power Center.
Zach Miller Computer Sciences Department University of Wisconsin-Madison What’s New in Condor.
Nimbus & OpenNebula Young Suk Moon. Nimbus - Intro Open source toolkit Provides virtual workspace service (Infrastructure as a Service) A client uses.
Hands-On Microsoft Windows Server 2008
Developing Workflows with SharePoint Designer David Coe Application Development Consultant Microsoft Corporation.
Copyright ®xSpring Pte Ltd, All rights reserved Versions DateVersionDescriptionAuthor May First version. Modified from Enterprise edition.NBL.
Hands-On Microsoft Windows Server Security Enhancements in Windows Server 2008 Windows Server 2008 was created to emphasize security –Reduced attack.
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
Conditions and Terms of Use
© 2012 Microsoft Corporation. All rights reserved.
Office of Science U.S. Department of Energy Evaluating Checkpoint/Restart on the IBM SP Jay Srinivasan
Scientific Computing Division Juli Rew CISL User Forum May 19, 2005 Scheduler Basics.
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
Bigben Pittsburgh Supercomputing Center J. Ray Scott
Configuring and Troubleshooting Identity and Access Solutions with Windows Server® 2008 Active Directory®
Module 7: Fundamentals of Administering Windows Server 2008.
Chapter 41 Processes Chapter 4. 2 Processes  Multiprogramming operating systems are built around the concept of process (also called task).  A process.
Chapter 13 Users, Groups Profiles and Policies. Learning Objectives Understand Windows XP Professional user accounts Understand the different types of.
Grid Computing I CONDOR.
Learningcomputer.com SQL Server 2008 – Administration, Maintenance and Job Automation.
1 Chapter Overview Publishing Resources in Active Directory Service Redirecting Folders Using Group Policies Deploying Applications Using Group Policies.
Progress with migration to SVN Part3: How to work with g4svn and geant4tags tools. Geant4.
Using the BYU SP-2. Our System Interactive nodes (2) –used for login, compilation & testing –marylou10.et.byu.edu I/O and scheduling nodes (7) –used for.
The Roadmap to New Releases Derek Wright Computer Sciences Department University of Wisconsin-Madison
Chapter 10 Chapter 10: Managing the Distributed File System, Disk Quotas, and Software Installation.
Module 7 Planning and Deploying Messaging Compliance.
Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies.
37 Copyright © 2007, Oracle. All rights reserved. Module 37: Executing Workflow Processes Siebel 8.0 Essentials.
APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host.
Peter Couvares Associate Researcher, Condor Team Computer Sciences Department University of Wisconsin-Madison
Microsoft Dynamics ® NAV 2009 Service Management.
Linux Operations and Administration
MCSE Guide to Microsoft Exchange Server 2003 Administration Chapter Three Managing Recipients.
CSC414 “Introduction to UNIX/ Linux” Lecture 6. Schedule 1. Introduction to Unix/ Linux 2. Kernel Structure and Device Drivers. 3. System and Storage.
CHAPTER Windows Server Management. Chapter Objectives Give an overview of the Server Manager Provide details of accessing the Server Manager Explain the.
Hands-On Microsoft Windows Server 2008 Chapter 5 Configuring Windows Server 2008 Printing.
GangLL Gang Scheduling on the IBM SP Andy B. Yoo and Morris A. Jette Lawrence Livermore National Laboratory.
Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.
C Copyright © 2006, Oracle. All rights reserved. Oracle Secure Backup Additional Installation Topics.
Monitoring Dynamic IOC Installations Using the alive Record Dohn Arms Beamline Controls & Data Acquisition Group Advanced Photon Source.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
ConfigMgr Discovering and Organizing Resources Mariusz Zarzycki, Phd, MCT, MCTS, MCITP, MCSE, MCSA.....
SQL Database Management
OpenPBS – Distributed Workload Management System
Connect:Direct for UNIX v4.2.x Silent Installation
Monitoring HTCondor with Ganglia
Accounting, Group Quotas, and User Priorities
Presentation transcript:

IBM Systems & Technology Group LoadLeveler 3.3 Dr. Roland Kunz, IT Specialist l

IBM Systems & Technology Group LoadLeveler V3.3 (GA 4/05)  LoadLeveler deliverables  Highlights  Scheduler Improvements  Preemption using Backfill Scheduler  Advance Reservation  Job Launch Performance Improvements  Additional Policy Control and Usability Enhancements  Grid interaction

IBM Systems & Technology Group HardwareOSGA LL 3.2 Intel 32 bit (xSeries nodes and blades) RHEL 3.05/2004 LL 3.2 pSeries (pSeries nodes and JS20s) SLES 85/2004 LL Intel 32 bit SLES 912/2004 LL pSeries SLES 912/2004 LL Opterons SLES 9 RHEL /2004 Linux Deliverables in 2004

IBM Systems & Technology Group HardwareOS xSeries – Intel (32 bit and EM64T)RHEL 3.0 RHEL 4.0 SLES 9 xSeries – AMDRHEL 3.0 RHEL 4.0 SLES 9 pSeriesSLES 9 RHEL 4.0 LL V3.3 on Linux – GA 8/05

IBM Systems & Technology Group

Backfill scheduler

IBM Systems & Technology Group Advance Reservation

IBM Systems & Technology Group Advance Reservation  Overview of Advance Reservation  Administrator Externals  User Externals  Misc.

IBM Systems & Technology Group Overview  Advance Reservation == Reservation (Terms used interchangeably)  Satisfy Grid Computing Customer Requirements to Reserve Resources for Jobs  Can Reserve Computing Resources (Nodes) for  Workload  Maintenance  …  For BACKFILL Scheduler Only

IBM Systems & Technology Group Reservation Request  Reservation: A set of nodes reserved for a period of time  Unique Reservation ID: c94n04.2.r  Add creation time to make it truly unique  Start Time: 03/23/ :00  Duration: 120 Minutes  A List of Reserved Nodes: c94n04 c94n03

IBM Systems & Technology Group Reservation Owner and Group  Owner: Peter  usually the user who made the reservation  can be changed by LoadLeveler administrators  can run jobs in the reservation  can modify or cancel the reservation  can allow other users to use the reservation  Group Owner: Research_Group  LoadLeveler group, not AIX/Linux group  For quota checking only

IBM Systems & Technology Group Reservation Owner and Group  Other Users: Dave Peter Rich  Other Groups of Users: test_group development_group  They are  optional  can be added/changed by the owner and LoadLeveler administrators  can only run jobs in the reservation  cannot modify or cancel the reservation  cannot allow other users to use the reservation

IBM Systems & Technology Group Reservations  c94n04.3.r  c94n03.5.r  c94n04.5.r  Reservations Do Not Overlap time nodes

IBM Systems & Technology Group Reserved Nodes  For exclusive use by default  Option to share resources - SHARED option ƒ after all jobs (which can run) start to run ƒ to share extra reserved resources  Option to end reservation earlier - REMOVE_ON_IDLE option ƒ after all jobs (which can run) finish running ƒ to avoid letting reserved nodes stay idle

IBM Systems & Technology Group State of the Reservation Request  Reservation State and Life Cycle:  WAITING (W)  SETUP (S)  ACTIVE (A)  ACTIVE_SHARED (AS)  COMPLETE (C)  CANCEL (CA)

IBM Systems & Technology Group Setup State of the Reservation  To get reserved nodes ready :  Preempt Running Jobs if any on the Reserved Nodes ƒ Use DEFAULT_PREEMPTION_METHOD  Check Status of Reserved Nodes  Send to the owner and LoadLeveler administrators if ƒ A node is not usable ƒ A non-preemptable job is running

IBM Systems & Technology Group Active State of the Reservation  Schedule Jobs to Run  No System Preemption in Reservation  Manual Preemption by llpreempt is allowed  Jobs should not require resources other than those on reserved machines.

IBM Systems & Technology Group Administrator Externals

IBM Systems & Technology Group Admin Control – User/Group  Must Setup Quotas:  max_reservations for User and Group stanzas  Example 1 : Every user can make a reservation:  default: type = user  max_reservations = 1  Example 2: A group of users can make 10 reservations total  res_group: type = group  max_reservations = 10  include_users = carol dave alex rich

IBM Systems & Technology Group Admin Control – User/Group (Cont.)  Example 3a: One group of users can make 4 reservations  res_group: type = group  max_reservations = 4  include_users = carol dave alex rich  Example 3b: One of the users can only make 1 reservation  carol: type = user  max_reservations = 1 Note: Each user in group res_group can make up to 4 reservations except carol who can make at most 1

IBM Systems & Technology Group Admin Control – User/Group (Cont.)  Use max_reservation_duration in User and Group stanzas to limit reservation duration  default: unlimited  Each user in the group can have any duration  res_group: type = group  max_reservations = 4  include_users = Carol Dave Tom Rich  Except Carol who can have duration up to 2hrs  carol: type = user  max_reservation_duration = 120

IBM Systems & Technology Group Admin Control – Node  Use reservation_permitted in machine stanzas to exclude the nodes from being reserved  default: true (All the nodes can be reserved.)  Exampe: Do not allow c94n05 to be reserved  c94n05: type = machine  reservation_permitted = false  Existing Reservations will not be affected

IBM Systems & Technology Group Config Control - RESERVATION_MIN_ADVANCE_TIME  Default: Reservation Can Start As Soon As Possible  RESERVATION_MIN_ADVANCE_TIME = 0  No advance time is required  To Require a Reservation Be Made One Day In Advance:  RESERVATION_MIN_ADVANCE_TIME = 1440

IBM Systems & Technology Group Config Control - RESERVATION_SETUP_TIME  Default: Set up process will be initiated 60 seconds ahead of the reservation start time  RESERVATION_SETUP_TIME = 60  Not to Set Aside Time for Reservation Setup:  RESERVATION_SETUP_TIME = 0  Reservation goes into SETUP state at start time and ACTIVE state right after the setup is complete

IBM Systems & Technology Group Config Control - RESERVATION_CAN_BE_EXCEEDED  Default: Jobs Allowed to Run Beyond Reservation End Time Subject to Resource Availability  After reservation ends, jobs can be preempted  To Not Allow Jobs to Run Beyond Reservation End Time  RESERVATION_CAN_BE_EXCEEDED = FALSE  May waste resources if jobs cannot finish before reservation ends

IBM Systems & Technology Group Config Control – RESERVATION_PRIORITY  Default: Reservations Can Not Be Created Before Running Jobs' Expected End Time  RESERVATION_PRIORITY = NONE  To Allow LoadLeveler Administrators to Suspend Running Jobs  RESERVATION_PRIORITY = HIGH  May need to preempt many jobs  Should use this option infrequently

IBM Systems & Technology Group Config Control – MAX_RESERVATIONS   Default: 10 Reservations Maximum Can Coexist  MAX_RESERVATIONS = 10  Can increase in a future version, as necessary

IBM Systems & Technology Group Accounting  To Collect Reservation History:  Set in LoadL_config to record reservation usage data  Set ACCT = A_RES in LoadL_config to record reservation usage data  The keyword in LoadL_config can be used to define the name of a file containing the local history of reservations  The RESERVATION_HISTORY keyword in LoadL_config can be used to define the name of a file containing the local history of reservations command can be used to merge reservation history files, similar to job history  llacctmrg -R command can be used to merge reservation history files, similar to job history

IBM Systems & Technology Group User Externals

IBM Systems & Technology Group Common Questions  How Many Reservations One Can Make and in Which Groups  How Long the Reservation Duration Can Be  Can Jobs Expected to Run Beyond Reservation End Time be Allowed to Run  How Far In Advance a Reservation Needs to be Made  Floating Consumable Resources Should Not Be Used by Jobs in a Reservation

IBM Systems & Technology Group User Commands  Use llmkres to Make a New Reservation  Use llchres to Modify an Existing Reservation  Use llrmres to Remove an Existing Reservation  Use llbind to Bind Jobs to an Existing Reservation  Use LL_RES_ID= llsubmit to submit Jobs to an Existing Reservation  Use llqres to Query Reservations  Use llq -l to Check Whether a Job Step is Bound to a Reservation

IBM Systems & Technology Group llmkres Examples  Reserve 2 nodes at 2pm today for 60 minutes  llmkres -t 14:00 -d 60 -n 2  Reserve a Specific Node c94n04  llmkres -t 03/23/ :00 -d 120 -h c94n04  Reserve All Available Nodes  llmkres -t 09:00 -d 60 -h all  Reserve Nodes to Run Job Step c94n  llmkres -t 14:00 -d 60 -j c94n04.3.0

IBM Systems & Technology Group llmkres Examples (Cont.)  Reserve Nodes to Run Job Submitted Through Job Command File weather.cmd  llmkres -t 14:00 -d 60 -f weather.cmd -g vip  Select REMOVE_ON_IDLE Option  llmkres -t 14:00 -d 60 -f nic.cmd -i yes  Select SHARED Option  llmkres -t 23:00 -d 360 -f unattended.cmd -s yes  Allow User Tom and All Users in LoadLeveler Group fvt to be Able to Run Jobs in the Reservation to be Created:  llmkres -t 14:00 -d 60 -n 2 -U Tom -G fvt

IBM Systems & Technology Group llchres Examples  To Move the Start Time of Reservation c94n04.2.r Earlier by 60 Minutes  llchres -t -60 -R c94n04.2.r  To Add a Reserved Node:  llchres -n +1 -R c94n04.2.r  To Remove a Specific Reserved Node:  llchres -h -c94n04 -R c94n04.2.r

IBM Systems & Technology Group llrmres Examples  To Remove Reservation c94n04.2.r  llrmres -R c94n04.2.r  To Remove All Reservations:  llrmres -R all Note: regular users can only remove all of their own reservations LoadLeveler administrators can remove all reservations

IBM Systems & Technology Group llbind and llsubmit Examples  To Submit a job to Reservation c94n04.2.r  LL_RES_ID=c94n04.2.r llsubmit weather.cmd  To Bind an Idle Job Step to Reservation c94n04.2.r  llbind -R c94n04.2.r c94n  To Unbind Job Steps from Reservation c94n03.1.r  llbind –r c94n03.1.r c94n04.1

IBM Systems & Technology Group llqres and llq Examples  To Find Out the Reservation ID of Job Step c94n  llq -l c94n |grep ID  To List All Reservations  llqres  To List All Attributes of Reservation c94n04.2.r  llqres -l -R c94n04.2.r

IBM Systems & Technology Group API  Reservation APIs  Sample: /usr/lpp/LoadL/full/samples/llres/res.c  RESERVATIONS query in Data Access APIs  /usr/lpp/LoadL/full/samples/lldata_access/qres.c  See LoadLeveler U&A Guide for More Information

IBM Systems & Technology Group Debugging Aids  Add D_RESERVATION to NEGOTIATOR_DEBUG and SCHEDD_DEBUG in the LoadL_config File to Get Reservation Related Messages Saved to Daemon Logs  Strings to grep to Get Reservation Related Messages in the CM and Schedd Logs:  "RES:" (in most messages)  "RES_SYNC" (used only when Schedd or CM first start up)  "reservation" (extra messages can be catched)  "AR:" (scheduling related, show up even if no reservations exist)

IBM Systems & Technology Group Backfill Preemption

IBM Systems & Technology Group Preemption with Backfill Scheduler  Similar concept as in Gang Scheduler  Options to terminate job in addition to suspend  Support for AIX and Linux –Suspend not supported on Linux  Configuration keyword PREEMPTION_SUPPORT – Applicable to Gang Scheduler only ; default for gang scheduler is FULL – When using the backfill scheduler, either do not set it or set it to NONE  Configuration keyword DEFAULT_PREEMPT_METHOD –Default is suspend –Must be set for Linux

IBM Systems & Technology Group Configuration Keywords  Configuration Keyword - START_CLASS –START_CLASS[incoming_class] = (start_class_expression) [ && (start_class_expression)...]  Configuration Keyword - PREEMPT_CLASS –PREEMPT_CLASS[incoming_class] = ALL[:preempt_method] { outgoing_class1 [outgoing_class2...] } –PREEMPT_CLASS[incoming_class] = ENOUGH[:preempt_method] { outgoing_class1 [outgoing_class2...] }

IBM Systems & Technology Group Preemption Methods  Configuration Keyword (for Backfill Scheduler Only) - DEFAULT_PREEMPT_METHOD = rm | sh | su | vc | uh  Preemption Methods –Remove (rm) –System Hold (sh) –Suspend (su) –Vacate (vc) –User Hold (uh)

IBM Systems & Technology Group Preemption Commands and API  llpreempt command –llpreempt -? | -H | -v | [-q] [-r | -m method] { [-u userlist] [-h hostlist] | [joblist] } –-r option only for jobs preempted by suspend –llhold -r to resume jobs preempted by system hold and user hold –Jobs preempted by remove must be resubmitted –Jobs preempted by vacate will restart when resources available  ll_preempt_jobs API –int ll_preempt_jobs (int version, void *errObj, LL_preempt_param **param); –Replaces ll_preempt API

IBM Systems & Technology Group llmodify and llq  llmodify command –llmodify [-?] | [-H] | [-v] | [-q] | {-x | -c | -m | -W | -C | -a | -s | -p {preempt|nopreempt} }  llq command – llq –l from Central Manager  Preemptable: yes – llq –s  This job step is scheduled to run but is waiting for the following job steps to be preempted:

IBM Systems & Technology Group Misc. Enhancements

IBM Systems & Technology Group Accounting  Correlating AIX and LoadLeveler Accounting Records –Find all AIX accounting records for all processes in LoadLeveler job –Add unique identifier in both AIX and LoadLeveler accounting –LoadLeveler invokes setsubproj() to set accounting key in AIX –Supported in AIX 5.2I and AIX 5.3 –AIX library dynamically loaded: /usr/lib/libaacct.a –Use llsummary –l to find accounting key in LoadLeveler history file  Job Step Id: c188f2n07.ppd.pok.ibm.com.1.0  Step Name: 0  Queue Date: Wed Feb 9 15:12:15 EST 2005  Job Accounting Key:  …

IBM Systems & Technology Group Modifying System Priority  Modifying job priorities – llmodify [-?] | [-H] | [-v] | [-q] | {-x | -c | -m | -W | -C | -a | -s | -p {preempt|nopreempt} } – LL_MODIFY_SYSPRIO enum with the ll_modify API – New priority is fixed – llq –l from Central Manager  System Priority:  q_sysprio: 97  Previous q_sysprio: -1560

IBM Systems & Technology Group More Control on Dispatching  New configuration keyword – SYSPRIO_THRESHOLD_TO_IGNORE_STEP = integer – Jobs with priority below threshold remain idle –llq –s  The job step can not be run because its q_sysprio value is below 25, the threshold specified by the SYSPRIO_THRESHOLD_TO_IGNORE_STEP keyword.

IBM Systems & Technology Group More Control on Scheduler  Controlling the Central Manager scheduling cycle –By default, the scheduler runs on interval specified in negotiator_interval –When negotiator_interval is set to 0 in the config file, the scheduler will stop dispatching jobs –Use the new llrunscheduler command or ll_run_scheduler API to initiate a scheduling cycle to dispatch jobs when the dispatching function is disabled

IBM Systems & Technology Group Running LoadLeveler jobs on the IBM Grid Toolbox (Globus) - LoadLeveler GAR to deploy (llgrid.gar) - GAR contains loadleveler.pm, globus-gram-loadleveler-provider, rips- loadleveler-provider.xml, mjs-ll-server-deploy.wsdd, server- deploy.wsdd, deploy/loadleveler-preDeploy.sh. The IBM Grid Toolbox must be installed and gars /mmjfs.gar and gars/gram-rips.gar must be deployed before deploying LoadLeveler GAR.  Log in as ibmgrid that is the owner ID of the Grid Toolbox.  export GLOBUS_LOCATION=/opt/IBMGrid  Copy llgrid.gar into $GLOBUS_LOCATION/gars directory. cp /usr/lpp/LoadL/full/lib/llgrid.gar $GLOBUS_LOCATION/gars/llgrid.gar  As ibmgrid, run the following commands to deploy the llgrid.gar: cd $GLOBUS_LOCATION. igt-setenv.sh Integration in a Grid environment

IBM Systems & Technology Group igt-deploy-gar gars/llgrid.gar  After deploying two new files exist: $GLOBUS_LOCATION/lib/perl/Globus/GRAM/JobManager/loadlevel er.pm $GLOBUS_LOCATION/etc/globus-gram-loadleveler-provider. LoadlevelerManagedJobFactoryService service name added in $GLOBUS_LOCATION/local-server-config.wsdd MasterLoadlevelerManagedJobFactoryService service added in $GLOBUS_LOCATION/AppServer/installedApps/DefaultNode/IBMGr id.ear/ogsa.war/WEB-INF/server-config.wsdd. LoadLevelerInformation service data provider enabled in $GLOBUS_LOCATION/AppServer/installedApps/DefaultNode/IBMGr id.ear/ogsa.war/WEB-INF/etc/rips-service-config.xml. Integration in a Grid environment

IBM Systems & Technology Group  Remove handler="jobDataHandler" in $GLOBUS_LOCATION/AppServer/installedApps/DefaultNo de/IBMGrid.ear/ogsa.war/WEB-INF/etc/rips-service- config.xml if this xml file contains handler="jobDataHandler". Integration in a Grid

IBM Systems & Technology Group Questions