Scheduling Policy John (TJ) Knoeller Condor Week 2017.

Slides:



Advertisements
Similar presentations
Jaime Frey Computer Sciences Department University of Wisconsin-Madison OGF 19 Condor Software Forum Routing.
Advertisements

Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
Building Campus HTC Sharing Infrastructures Derek Weitzel University of Nebraska – Lincoln (Open Science Grid Hat)
1 Concepts of Condor and Condor-G Guy Warner. 2 Harvesting CPU time Teaching labs. + Researchers Often-idle processors!! Analyses constrained by CPU time!
More HTCondor 2014 OSG User School, Monday, Lecture 2 Greg Thain University of Wisconsin-Madison.
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
Condor Overview Bill Hoagland. Condor Workload management system for compute-intensive jobs Harnesses collection of dedicated or non-dedicated hardware.
Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Chapter 13: Sharing Printers on Windows Server 2008 R2 Networks BAI617.
19-Aug-15 About the Chat program. 2 Constraints You can't have two programs (or two copies of the same program) listen to the same port on the same machine.
Utilizing Condor and HTC to address archiving online courses at Clemson on a weekly basis Sam Hoover 1 Project Blackbird Computing,
Zach Miller Computer Sciences Department University of Wisconsin-Madison What’s New in Condor.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
Alain Roy Computer Sciences Department University of Wisconsin-Madison An Introduction To Condor International.
High Throughput Computing with Condor at Purdue XSEDE ECSS Monthly Symposium Condor.
Condor Tugba Taskaya-Temizel 6 March What is Condor Technology? Condor is a high-throughput distributed batch computing system that provides facilities.
Campus Grids Report OSG Area Coordinator’s Meeting Dec 15, 2010 Dan Fraser (Derek Weitzel, Brian Bockelman)
Part 6: (Local) Condor A: What is Condor? B: Using (Local) Condor C: Laboratory: Condor.
Intermediate Condor Rob Quick Open Science Grid HTC - Indiana University.
Grid job submission using HTCondor Andrew Lahiff.
Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - Based on “Optimizing.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Dan Bradley University of Wisconsin-Madison Condor and DISUN Teams Condor Administrator’s How-to.
Migration to 7.4, Group Quotas, and More William Strecker-Kellogg Brookhaven National Lab.
Derek Wright Computer Sciences Department University of Wisconsin-Madison Condor and MPI Paradyn/Condor.
Derek Wright Computer Sciences Department University of Wisconsin-Madison New Ways to Fetch Work The new hook infrastructure in Condor.
Condor Week 2004 The use of Condor at the CDF Analysis Farm Presented by Sfiligoi Igor on behalf of the CAF group.
how Shibboleth can work with job schedulers to create grids to support everyone Exposing Computational Resources Across Administrative Domains H. David.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor NT Condor ported.
HTCondor Security Basics HTCondor Week, Madison 2016 Zach Miller Center for High Throughput Computing Department of Computer Sciences.
Campus Grid Technology Derek Weitzel University of Nebraska – Lincoln Holland Computing Center (HCC) Home of the 2012 OSG AHM!
Taming Local Users and Remote Clouds with HTCondor at CERN
Intermediate Condor Monday morning, 10:45am Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
ITMT Windows 7 Configuration Chapter 6 – Sharing Resource ITMT 1371 – Windows 7 Configuration 1.
HTCondor stats John (TJ) Knoeller Condor Week 2016.
CHTC Policy and Configuration
Improvements to Configuration
Condor DAGMan: Managing Job Dependencies with Condor
User-Written Functions
HTCondor Security Basics
Quick Architecture Overview INFN HTCondor Workshop Oct 2016
Scheduling Policy John (TJ) Knoeller Condor Week 2017.
Examples Example: UW-Madison CHTC Example: Global CMS Pool
Operating a glideinWMS frontend by Igor Sfiligoi (UCSD)
Things you may not know about HTCondor
Moving CHTC from RHEL 6 to RHEL 7
High Availability in HTCondor
Things you may not know about HTCondor
CREAM-CE/HTCondor site
Moving CHTC from RHEL 6 to RHEL 7
Building Grids with Condor
Accounting in HTCondor
The Scheduling Strategy and Experience of IHEP HTCondor Cluster
HTCondor Command Line Monitoring Tool
Chapter 18: Modifying SAS Data Sets and Tracking Changes
Negotiator Policy and Configuration
Accounting, Group Quotas, and User Priorities
What’s New in DAGMan HTCondor Week 2013
HTCondor Security Basics HTCondor Week, Madison 2016
Job Matching, Handling, and Other HTCondor Features
Condor Glidein: Condor Daemons On-The-Fly
Basic Grid Projects – Condor (Part I)
Upgrading Condor Best Practices
HTCondor Training Florentia Protopsalti IT-CM-IS 1/16/2019.
The Condor JobRouter.
Condor: Firewall Mirroring
Late Materialization has (lately) materialized
Condor-G Making Condor Grid Enabled
JRA 1 Progress Report ETICS 2 All-Hands Meeting
Negotiator Policy and Configuration
Presentation transcript:

Scheduling Policy John (TJ) Knoeller Condor Week 2017

Overview Policy options in the SCHEDD Limits Job policy Mutating jobs Preventing changes

Limits Max jobs running Max jobs per submission Max jobs per Owner (8.6) Max running DAGs per Owner (8.6) Max materialized jobs per cluster (8.7.1) Max active input transfers Max active output transfers

User vs Owner vs Submitter Owner attribute of job is OS ‘user’ Shadow impersonates Owner for file i/o Set by SCHEDD based on submit identity Immutable Accounting ‘user’ a.k.a. Submitter Who’s quota/priority is checked/docked (Owner + Nice) + Domain + AccountingGroup User can change at will

Most limits are Submitter limits “Fair” share is by submitter Negotiator only knows about submitters Priority / Quota Transfer queue A few per-owner limits Max jobs per owner (8.6) Max running DAGs per owner (8.6)

Monitoring the limits Several talks on this on Thursday Schedd Stats condor_status –schedd –direct -long Per submitter stats condor_status –submit –long condor_sos condor_q –tot –long Show jobs doing file transfer condor_sos condor_q –io

Job policy You want to have a policy about what jobs are allowed, or require certain attributes? Submit requirements Submit attributes Job transforms

Example job policy All jobs must have "Experiment" attribute Reject jobs that don't conform to the policy SUBMIT_REQUIREMENT_NAMES = $(SUBMIT_REQUIREMENT_NAMES) CheckExp SUBMIT_REQUIREMENT_CheckExp = \ JobUniverse == 7 || Experiment isnt undefined SUBMIT_REQUIREMENT_CheckExp_REASON = \ "submissions must have +Experiment" # JobUniverse 7 is Scheduler universe, i.e. DAGMAN. # JobUniverse 12 is Local universe, maybe except this also?

Defaulting job attributes Configure SUBMIT_ATTRS to add attributes to jobs. SUBMIT_ATTRS = $(SUBMIT_ATTRS) Experiment Experiment = "CHTC" Job ad starts with Experiment="CHTC" before the submit file is processed

SUBMIT_ATTRS Good for setting defaults Work happens outside of the SCHEDD User can override or un-configure Unconditional May not happen with remote submit (Depends on who owns the config)

Mutating jobs using job transforms (new in 8.6) Configure JOB_TRANSFORM_* JOB_TRANSFORM_NAMES = $(JOB_TRANSFORM_NAMES) SetExp JOB_TRANSFORM_SetExp = [ set_Experiment = "CHTC"; ] Experiment="CHTC" written into each job ad as it is submitted. probably not a good thing in this case

Transforming only some jobs JOB_TRANSFORM_NAMES = $(JOB_TRANSFORM_NAMES) SetExp JOB_TRANSFORM_SetExp @=end [ Requirements = JobUniverse != 7 && Experiment is undefined set_Experiment = "CHTC"; ] @end Adds Experiment="CHTC" to each job that doesn't already have that attribute

About job transforms Converted to native syntax on startup Job router syntax is loosely ordered copy > delete > set > eval_set Native syntax is Confusing (and might be changing) Top to bottom Has temporary variables Has Conditionals

Job transform native syntax # Use job transform to add pool constraint to vanilla jobs # based on whether the job needs GPUs or not # JOB_TRANSFORM_GPUS @=end REQUIREMENTS JobUniverse == 5 tmp.NeedsGpus = $(MY.RequestGPUs:0) > 0 if $INT(tmp.NeedsGpus) SET Requirements $(MY.Requirements) && (Pool == "ICECUBE") else SET Requirements $(MY.Requirements) && (Pool == "CHTC") endif @end

Preventing change IMMUTABLE_JOB_ATTRS PROTECTED_JOB_ATTRS Cannot be changed once set PROTECTED_JOB_ATTRS Cannot be changed by the user SECURE_JOB_ATTRS Like protected, but have security implications IMMUTABLE_JOB_ATTRS=$(IMMUTABLE_JOB_ATTRS) Experiment

The motivating case for all this How do I assign jobs to accounting groups automatically, while preventing users from cheating? Job transforms + Immutable attributes But doing this in classad language is painful eval_set_AcctGroup=\ IfThenElse(Owner=="Bob","CHTC", IfThenElse(Owner=="Alice","Math", IfThenElse(Owner=="Al","Physics","Unknown") ))

Introducing Map files Map file is text, with 3 fields per line * <key_or_regex> <result_list> * Bob CHTC, Security * Alice CHTC, Math, Physics * /.*Hat/i Problem * /.*/ CHTC Yes, the first field must be *

Defining a map SCHEDD_CLASSAD_USER_MAP_NAMES = MyMap CLASSAD_USER_MAPFILE_MyMap = /path/to/mapfile <or> SCHEDD_CLASSAD_USER_MAPDATA_MyMap @=end * Bob CHTC,Security * Alice CHTC,Math,Physics * /.*Hat/i Problem * /.*/ CHTC @end Can now use the userMap("MyMap") function in Classad expressions in the SCHEDD.

The Classad userMap function result = userMap(mname, input) map input to first result result = userMap(mname, input, preferred) map input to preferred result result = userMap(mname, input, pref, def) map input to preferred or default result

Putting it all together SCHEDD_CLASSAD_USER_MAP_NAMES = $(SCHEDD_CLASSAD_USER_MAP_NAMES) Groups CLASSAD_USER_MAPFILE_Groups = /path/to/mapfile # Assign groups automatically JOB_TRANSFORM_NAMES = AssignGroup JOB_TRANSFORM_AssignGroup @=end [ copy_Owner="AcctGroupUser"; copy_AcctGroup="RequestedAcctGroup"; eval_set_AcctGroup=usermap("AssignGroup",AcctGroupUser,AcctGroup); ] @end # Prevent Cheating IMMUTABLE_JOB_ATTRS = $(IMMUTABLE_JOB_ATTRS) AcctGroup AcctGroupUser SUBMIT_REQUIREMENT_NAMES = $(SUBMIT_REQUIREMENT_NAMES) CheckGroup SUBMIT_REQUIREMENT_CheckGroup = AcctGroup isnt undefined SUBMIT_REQUIREMENT_CheckGroup_REASON = strcat("Could not map '", Owner, "' to a group")

Or, to put it another way to see what this metaknob expands to use FEATURE:AssignAccountingGroup(/path/map) You can run condor_config_val use feature:AssignAccountingGroup to see what this metaknob expands to

Any Questions?