Accounting, Group Quotas, and User Priorities

Slides:



Advertisements
Similar presentations
Jaime Frey Computer Sciences Department University of Wisconsin-Madison OGF 19 Condor Software Forum Routing.
Advertisements

UK Condor Week NeSC – Scotland – Oct 2004 Condor Team Computer Sciences Department University of Wisconsin-Madison The Bologna.
Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
HTCondor scheduling policy
Greg Thain Computer Sciences Department University of Wisconsin-Madison Condor Parallel Universe.
1 Concepts of Condor and Condor-G Guy Warner. 2 Harvesting CPU time Teaching labs. + Researchers Often-idle processors!! Analyses constrained by CPU time!
More HTCondor 2014 OSG User School, Monday, Lecture 2 Greg Thain University of Wisconsin-Madison.
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
Matchmaking in the Condor System Rajesh Raman Computer Sciences Department University of Wisconsin-Madison
Priority and Provisioning Greg Thain HTCondorWeek 2015.
Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Alain Roy Computer Sciences Department University of Wisconsin-Madison An Introduction To Condor International.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
An Introduction to High-Throughput Computing Rob Quick OSG Operations Officer Indiana University Some Content Contributed by the University of Wisconsin.
An Introduction to High-Throughput Computing Monday morning, 9:15am Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
Large, Fast, and Out of Control: Tuning Condor for Film Production Jason A. Stowe Software Engineer Lead - Condor CORE Feature Animation.
Grid Computing I CONDOR.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
Review of Condor,SGE,LSF,PBS
Dan Bradley University of Wisconsin-Madison Condor and DISUN Teams Condor Administrator’s How-to.
Migration to 7.4, Group Quotas, and More William Strecker-Kellogg Brookhaven National Lab.
Condor Week 2004 The use of Condor at the CDF Analysis Farm Presented by Sfiligoi Igor on behalf of the CAF group.
Peter Couvares Associate Researcher, Condor Team Computer Sciences Department University of Wisconsin-Madison
An Introduction to High-Throughput Computing With Condor Tuesday morning, 9am Zach Miller University of Wisconsin-Madison.
Landing in the Right Nest: New Negotiation Features for Enterprise Environments Jason Stowe.
How High Throughput was my cluster? Greg Thain Center for High Throughput Computing.
An Introduction to High-Throughput Computing Monday morning, 9:15am Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
Condor Tutorial NCSA Alliance ‘98 Presented by: The Condor Team University of Wisconsin-Madison
Condor Project Computer Sciences Department University of Wisconsin-Madison Using New Features in Condor 7.2.
Matchmaker Policies: Users and Groups HTCondor Week, Madison 2016 Zach Miller Jaime Frey Center for High Throughput.
Campus Grid Technology Derek Weitzel University of Nebraska – Lincoln Holland Computing Center (HCC) Home of the 2012 OSG AHM!
Condor Week Apr 30, 2008Pseudo Interactive monitoring - I. Sfiligoi1 Condor Week 2008 Pseudo-interactive monitoring in Condor by Igor Sfiligoi.
Profiling Applications to Choose the Right Computing Infrastructure plus Batch Management with HTCondor Kyle Gross – Operations Support.
Intermediate Condor Monday morning, 10:45am Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
Condor Week May 2012No user requirements1 Condor Week 2012 An argument for moving the requirements out of user hands - The CMS experience presented.
Monitoring Primer HTCondor Workshop Barcelona Todd Tannenbaum Center for High Throughput Computing University of Wisconsin-Madison.
Monitoring Primer HTCondor Week 2017 Todd Tannenbaum Center for High Throughput Computing University of Wisconsin-Madison.
CHTC Policy and Configuration
Debugging Common Problems in HTCondor
Experience on HTCondor batch system for HEP and other research fields at KISTI-GSDC Sang Un Ahn, Sangwook Bae, Amol Jaikar, Jin Kim, Byungyun Kong, Ilyeon.
First proposal for a modification of the GIS schema
HTCondor Networking Concepts
Scheduling Policy John (TJ) Knoeller Condor Week 2017.
HTCondor Networking Concepts
More HTCondor Monday AM, Lecture 2 Ian Ross
HTCondor Security Basics
Quick Architecture Overview INFN HTCondor Workshop Oct 2016
Scheduling Policy John (TJ) Knoeller Condor Week 2017.
Examples Example: UW-Madison CHTC Example: Global CMS Pool
Operating a glideinWMS frontend by Igor Sfiligoi (UCSD)
Matchmaker Policies: Users and Groups HTCondor Week, Madison 2017
High Availability in HTCondor
CREAM-CE/HTCondor site
Building Grids with Condor
A Distributed Policy Scenario
Accounting in HTCondor
Negotiator Policy and Configuration
Job Matching, Handling, and Other HTCondor Features
Basic Grid Projects – Condor (Part I)
Types, Truth, and Expressions (Part 2)
Types, Truth, and Expressions (Part 2)
Upgrading Condor Best Practices
Sun Grid Engine.
Condor: Firewall Mirroring
Late Materialization has (lately) materialized
Condor Administration in the Open Science Grid
Condor-G Making Condor Grid Enabled
GLOW A Campus Grid within OSG
Negotiator Policy and Configuration
PU. Setting up parallel universe in your pool and when (not
Presentation transcript:

Accounting, Group Quotas, and User Priorities

Why are you here? To learn about How Condor chooses the next job to run How you can change which job will run How you can prioritize jobs by project instead of by user How you can group users into different projects How you can assign usage minimums to groups of users

What job runs next? A condor “queue” is not FIFO! Determined by a balancing the wants & needs of three entities The user (schedd) The pool administrator (negotiator) The machine owner (startd) All comes together in the negotiation cycle

negotiator collector startd startd schedd schedd startd 1. Startds send machine ads 2. Schedds send submittor ads

Machine Ads condor_status –l romano.cs.wisc.edu MachineMyType = "Machine" TargetType = "Job" Name = "vm6@romano.cs.wisc.edu" Machine = "romano.cs.wisc.edu“ Requirements = LoadAvg < 0.5 Rank = 0.0 Disk = 2019048 LoadAvg = 0.000000 KeyboardIdle = 1018497 Memory = 512 Cpus = 1 Mips = 124122

Submittor Ads condor_status -sub matthew@wisc.edu -l MyType = "Submitter" TargetType = "" Machine = "rosalind.cs.wisc.edu" ScheddIpAddr = "<128.105.166.39:42190>" Name = “matthew@cs.wisc.edu” RunningJobs = 1 IdleJobs = 1 HeldJobs = 0 MaxJobsRunning = 500 StartSchedulerUniverse = TRUE MonitorSelfImageSize = 9164.000000 MonitorSelfResidentSetSize = 2432

negotiator collector Let’s look inside the negotiator during a startd Let’s look inside the negotiator during a negotiation cycle… startd schedd schedd startd 1. Startds send machine ads 2. Schedds send submittor ads

Inside the Negotiator…

negotiator accountant

Negotiation Cycle Get all startd and submittor ads Get user priorities for all submitters, or accounting principles. (via Name attribute in submitter ad) Sort submitter ads Talk to schedds in accounting principle order Schedds send requests one at a time, sorted by job priority

For Each Job:  Job Ad’s RANK  NEGOTIATOR_POST_JOB_RANK Find all machine ads that match Sort machine ads that match by:  NEGOTIATOR_PRE_JOB_RANK  Job Ad’s RANK  NEGOTIATOR_POST_JOB_RANK Is the machine ad candidate already running a job? Priority preemption if PREEMPTION_REQUIREMENTS evaluates to True. Give the schedd the match, or tell it no match found. Schedd responds w/ next request (maybe skipping to the current AutoCluster).

Some observations Job priority (condor_prio) will not allow one user to run ahead of another user. Job priority is specific per user per schedd.

Examples Job says : Rank = Memory Config file does not define: NEGOTIATOR_PRE_JOB_RANK User will then ALWAYS get the highest memory machine, even if already being used by a lower priority user.

Examples Job says : Rank = Memory Config file says: NEGOTIATOR_PRE_JOB_RANK = RemoteUser =?= UNDEFINED User will then get the highest IDLE memory machine, and will only preempt a user if there are no idle machines match.

Accounting Groups I don’t care about WHO submitted the job. How do I change the accounting principle?

Account Groups, cont In job submit file executable = foo universe = vanilla +AccountingGroup = “Project44” queue

A given group should have priority on 50 nodes of my 500 machine cluster. How? Answer A: Startd Rank Answer B: Group Quotas

Group Quotas – Config Params GROUP_NAMES - list the recognized group names.Example:       GROUP_NAMES = group-cms, group-infn GROUP_QUOTA_<groupname> - the number of machines 'owned' by this group.    Example:       GROUP_QUOTA_group-cms = 10       GROUP_QUOTA_group-cms = 5 GROUP_AUTOREGROUP - set this to either True or False.  Defaults to      false.  If true, then users who submitted to a specific group      will also negotiate a second time with the "none" group, allowing      group jobs to be matched w/ idle machines even if the group is      overquota.

Negotiation w/ Group Quotas Matchmaker first negotiates for groups, sorted by how far they are under quota. Negotiation within a group follows the exact same algorithm as before. THEN, negotiate for all users that are not in a group as before.

Lots o “tools” to get what you want USER: Job Requirements, Job Rank ADMIN: user priorities, accounting groups, accounting group quotas, preemption_requirements, negotiator_pre|post_job_rank. OWNER: Machine Requirements, Machine Rank.

Challenge Question I want LOW, MED, and HIGH strict priority job types. But at each priority level, I want fair share. HOW??

Questions?