Presentation is loading. Please wait.

Presentation is loading. Please wait.

Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor Administration.

Similar presentations


Presentation on theme: "Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor Administration."— Presentation transcript:

1 Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu http://www.cs.wisc.edu/condor Condor Administration European Condor Week 2006 INFN Milan Milan, Italy - June 27,2006

2 www.cs.wisc.edu/condor 2 Outline › Condor Daemons  Job Startup › Configuration Files › Policy Expressions  Startd (Machine)  Negotiator › Priorities › Useful Tools › Log Files › Debugging Jobs › Security

3 www.cs.wisc.edu/condor 3 Installation

4 www.cs.wisc.edu/condor 4 Considerations for Installing a Condor Pool › What machine should be your central manager? › Does your pool have a shared file system? › Where to install Condor binaries and configuration files? › Where should you put each machine’s local directories? › Start the daemons as root or as some other user?

5 www.cs.wisc.edu/condor 5 What machine should be your central manager? › The central manager is very important for the proper functioning of your pool › If the central manager crashes, jobs that are currently matched will continue to run, but new jobs will not be matched

6 www.cs.wisc.edu/condor 6 Central Manager › Want assurances of high uptime or prompt reboots  Or designate “hot-spares” › A good network connection helps › Speed helps

7 www.cs.wisc.edu/condor 7 Does your pool have a shared file system? › It is easier to run vanilla universe jobs if so, but one is not required › Shared location for configuration files can ease administration of a pool › AFS can work, but Condor does not yet manage AFS tokens

8 www.cs.wisc.edu/condor 8 Where to install binaries and configuration files? › Shared location for configuration files can ease administration of a pool › Binaries on a shared file system makes upgrading easier, but can be less stable if there are network problems › condor_master on the local disk is a good compromise

9 www.cs.wisc.edu/condor 9 Where should you put each machine’s local directories? › You need a fair amount of disk space in the spool directory for  Log files  Submit machine: the job queue and any spooled data files for each job submitted  Execute machine: the binary for any Condor job running on a machine, plus any staged data files

10 www.cs.wisc.edu/condor 10 Hostnames › Any two machines that will be communicating must know each others names  Proper /etc/hosts or DNS including inverse domain

11 www.cs.wisc.edu/condor 11 Start the daemons as root or some other user? › If possible, we recommend starting the daemons as root  Condor better able to enforce policy  More secure (!?!?!)  Less confusion for users  Condor daemons will run as the user “condor” whenever possible, and only switch to root when needed

12 www.cs.wisc.edu/condor 12 Running Daemons as Non-Root › More secure › Condor will still work, users just have to take some extra steps to submit jobs › Can have “personal Condor” installed - only you can submit jobs

13 www.cs.wisc.edu/condor 13 Basic Installation Procedure › 1. Decide what version and parts of Condor to install and download them › 2. Install the “release directory” - all the Condor binaries and libraries › 3. Setup the Central Manager › 4. (optional) Setup Condor on any other machines you wish to add to the pool › 5. Spawn the Condor daemons

14 www.cs.wisc.edu/condor 14 Condor Version Series › We distribute two versions of Condor  Stable Series  Development Series

15 www.cs.wisc.edu/condor 15 Stable Series › Heavily tested › Recommended for general use › 2nd number of version string is even (6.6.3)

16 www.cs.wisc.edu/condor 16 Development Series › Latest features, not necessarily well- tested › Not recommended unless you’re willing to work with beta code or need new features › 2nd number of version string is odd (6.7.1)

17 www.cs.wisc.edu/condor 17 Condor Versions › What am I running? › All daemons advertise a CondorVersion attribute in the ClassAd they publish › You can also view the version string by  Run “condor_version”  “-v” option  running ident on any Condor binary

18 www.cs.wisc.edu/condor 18 Condor Versions › All parts of Condor on a single machine should run the same version! › Machines in a pool can usually run different versions and communicate with each other › Documentation will specify when a version is incompatible with older versions

19 www.cs.wisc.edu/condor 19 Downloading Condor › Go to http://www.cs.wisc.edu/condor/ › Fill out the form and download the different pieces you need  Normally, you want the full stable release › There are also “contrib” modules for non-standard parts of Condor  For example, the View Server

20 www.cs.wisc.edu/condor 20 Downloading Condor › Unix  Distributed as compressed “tar” files  Once you download, unpack them  What about distribution XXX of Linux? See http://www.cs.wisc.edu/condor/porting/ › Windows  MSI installer or ZIP file

21 www.cs.wisc.edu/condor 21 Install the Release Directory › In the directory where you unpacked the tar file, you’ll find a release.tar file with all the binaries and libraries › Use condor_install or condor_configure › condor_install will install this as the release directory for you

22 www.cs.wisc.edu/condor 22 condor_install › Our old installation script › Interactive › Overly complex

23 www.cs.wisc.edu/condor 23 condor_configure › New script › Handles installation and reconfiguration condor_configure --install --install-dir=/nfs/opt/condor --local-dir=/var/condor --owner=condor

24 www.cs.wisc.edu/condor 24 Setup the Central Manager › Central manager needs specific configuration to start the condor_collector and condor_negotiator  condor_configure --type=manager

25 www.cs.wisc.edu/condor 25 Setup Additional Machines › If you have a shared file system, just run condor_init on any other machine you wish to add to your pool › OR, run condor_configure --central-manager=

26 www.cs.wisc.edu/condor 26 Spawn the Condor daemons › Run condor_master to start Condor  Remember to start as root if desired › Start Condor on the central manager first › Add Condor to your boot scripts?  We provide a “SysV-style” init script ( /etc/examples/condor.boot )

27 www.cs.wisc.edu/condor 27 Condor-G Special Notes › Condor-G should work out of the box › Globus can push several limits, consider increasing:  /proc/sys/fs/file-max  /proc/sys/net/ipv4/ip_local_port_range  Per process file descriptor limits http://www.cs.wisc.edu/condor/condorg/linux_scalability.html

28 www.cs.wisc.edu/condor 28 Other Unix Installation Options › Use RPM package (from Condor web site) › VDT – Virtual Data Toolkit  PacMan installer  Includes other Grid software  http://www.vdt.org/

29 www.cs.wisc.edu/condor 29 Windows Installation › v6.6.x : InstallShield installation program  Interactive GUI or via command-line with “saved answers file” › v6.7.x : MSI installation program  Interactive GUI or MSI unattended › ZIP file

30 www.cs.wisc.edu/condor 30 Other Sources › Condor Manual › Condor Web Site › condor-users mailing list http://www.cs.wisc.edu/condor/mail-lists/ › condor-admin@cs.wisc.edu

31 www.cs.wisc.edu/condor 31 Condor Daemons Title unknown, by Hans Holbein the Younger, from Historiarum Veteris Testamenti icones, 1543

32 www.cs.wisc.edu/condor 32 What Condor Daemons are running on my machine, and what do they do?

33 www.cs.wisc.edu/condor 33 Condor Daemon Layout Personal Condor / Central Manager master collector negotiator scheddstartd = Process Spawned

34 www.cs.wisc.edu/condor 34 condor_master › Starts up all other Condor daemons › If there are any problems and a daemon exits, it restarts the daemon and sends email to the administrator › Checks the time stamps on the binaries of the other Condor daemons, and if new binaries appear, the master will gracefully shutdown the currently running version and start the new version master

35 www.cs.wisc.edu/condor 35 condor_master (cont’d) › Acts as the server for many Condor remote administration commands:  condor_reconfig, condor_restart, condor_off, condor_on, condor_config_val, etc.

36 www.cs.wisc.edu/condor 36 condor_startd › Represents a machine to the Condor system › Responsible for starting, suspending, and stopping jobs › Enforces the wishes of the machine owner (the owner’s “policy”… more on this soon) master startd

37 www.cs.wisc.edu/condor 37 condor_schedd › Represents users to the Condor system › Maintains the persistent queue of jobs › Responsible for contacting available machines and sending them jobs › Services user commands which manipulate the job queue:  condor_submit,condor_rm, condor_q, condor_hold, condor_release, condor_prio, … master scheddstartd

38 www.cs.wisc.edu/condor 38 condor_collector › Collects information from all other Condor daemons in the pool  “Directory Service” / Database for a Condor pool › Each daemon sends a periodic update called a “ClassAd” to the collector › Services queries for information:  Queries from other Condor daemons  Queries from users (condor_status) schedd collector master startd

39 www.cs.wisc.edu/condor 39 condor_negotiator › Performs “matchmaking” in Condor › Gets information from the collector about all available machines and all idle jobs › Tries to match jobs with machines that will serve them › Both the job and the machine must satisfy each other’s requirements master collector negotiator scheddstartd

40 www.cs.wisc.edu/condor 40 Central Manager › The Central Manager is the machine running the collector and negotiator DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR › Defines a Condor pool. CONDOR_HOST = centralmanager.example.com

41 www.cs.wisc.edu/condor 41 Typical Condor Pool Central Manager master collector negotiator schedd startd = ClassAd Communication Pathway = Process Spawned Submit-Only master schedd Execute-Only master startd Regular Node schedd startd master Regular Node schedd startd master Execute-Only master startd

42 www.cs.wisc.edu/condor 42 Job Startup “LUNAR Launch” by “jurvetson” © http://www.flickr.com/photos/jurvetson/114406979/

43 www.cs.wisc.edu/condor 43 Execute MachineSubmit Machine Job Startup Submit Schedd Starter Job Shadow Condor Syscall Lib Startd Central Manager CollectorNegotiator

44 www.cs.wisc.edu/condor 44 condor_starter › Spawned by the condor_startd to handle all the details of starting and managing the job  Transfer job’s binary to execute machine  Send back exit status  Etc.

45 www.cs.wisc.edu/condor 45 condor_starter › On multi-processor machines, you get one condor_starter per CPU  Actually one per running job  Can configure to run more (or less) jobs than CPUs › For PVM jobs, the starter also spawns a PVM daemon ( condor_pvmd )

46 www.cs.wisc.edu/condor 46 condor_shadow › Represents job on the submit machine › Services requests from standard universe jobs for remote system calls  including all file I/O › Makes decisions on behalf of the job  for example: where to store the checkpoint file

47 www.cs.wisc.edu/condor 47 condor_shadow Impact › One condor_shadow running on submit machine for each actively running Condor job › Minimal load on submit machine  Usually blocked waiting for requests from the job or doing I/O  Relatively small memory footprint

48 www.cs.wisc.edu/condor 48 Limiting condor_shadow › Still, you can limit the impact of the shadows on a given submit machine:  They can be started by Condor with a “nice-level” that you configure ( SHADOW_RENICE_INCREMENT )  Can limit total number of shadows running on a machine ( MAX_JOBS_RUNNING )

49 www.cs.wisc.edu/condor 49 Configuration Files “amp wiring” by “fabz” © http://www.flickr.com/photos/fbz/114422787/

50 www.cs.wisc.edu/condor 50 Configuration Files › Multiple files concatenated  Definitions in later files overwrite previous definitions › Order of files:  Global config file  Local config files  Root config files

51 www.cs.wisc.edu/condor 51 Global Config File › Found either in file pointed to with the CONDOR_CONFIG environment variable, /etc/condor/condor_config, or ~condor/condor_config › Most settings can be in this file › Typically kept on a shared file system

52 www.cs.wisc.edu/condor 52 Other Shared Files › LOCAL_CONFIG_FILE macro  Comma separated, processed in order › You can configure a number of other shared config files:  Organize common settings (for example, all policy expressions)  platform-specific config files

53 www.cs.wisc.edu/condor 53 Local Config File › Specify in LOCAL_CONFIG_FILE macro  Tip: can use $(HOSTNAME) › Machine-specific settings  local policy settings for a given owner  different daemons to run (for example, on the Central Manager!)

54 www.cs.wisc.edu/condor 54 Local Config File, cont. › Can be on local disk of each machine: /var/adm/condor/condor_config.local › Can be in a shared directory: /shared/condor/condor_config.$(HOSTNAME) /shared/condor/hosts/$(HOSTNAME)/condor_config.local

55 www.cs.wisc.edu/condor 55 Root Config File › Always processed last › Allows root to specify settings which cannot be changed by other users  For example, the path to Condor daemons › Useful if daemons are started as root but someone else has write access to config files

56 www.cs.wisc.edu/condor 56 Root Config File › /etc/condor/condor_config.root or ~condor/condor_config.root › Then loads any files specified in ROOT_CONFIG_FILE_LOCAL

57 www.cs.wisc.edu/condor 57 Configuration File Syntax › # at start of line is a comment  not allowed in names, confuses Condor. › \ at the end of line is a line- continuation  Both lines are treated as one big entry  Works in comments! # This comment eats the next line \ EXAMPLE_SETTING=TRUE

58 www.cs.wisc.edu/condor 58 Configuration File Macros › Macros have the form:  Attribute_Name = value Names are case insensitive Values are case sensitive › You reference other macros with:  A = $(B) › Can create additional macros for organizational purposes

59 www.cs.wisc.edu/condor 59 Configuration File Macros › Can append to macros: A=abc A=$(A),def › Don’t let macros recursively define each other! A=$(B) B=$(A)

60 www.cs.wisc.edu/condor 60 Configuration File Macros › Later macros in a file overwrite earlier ones  B will evaluate to 2: A=1 B=$(A) A=2

61 www.cs.wisc.edu/condor 61 ClassAds › Set of key-value pairs › Can be matched against each other  Requirements and Rank MY.name – Looks for “name” in local ClassAd TARGET.name – Looks for “name” in the other ClassAd Name – Looks for “name” in the local ClassAd, then the other ClassAd

62 www.cs.wisc.edu/condor 62 ClassAd Expressions › Some configuration file macros specify expressions for the Machine’s ClassAd  Notably START, RANK, SUSPEND, CONTINUE, PREEMPT, KILL › Can contain a mixture of macros and ClassAd references › Notable: UNDEFINED, ERROR

63 www.cs.wisc.edu/condor 63 ClassAd Expressions › +, -, *, /,, >=, ==, !=, &&, and || all work as expected › TRUE==1 and FALSE==0 (guaranteed)

64 www.cs.wisc.edu/condor 64 ClassAd Expressions: UNDEFINED and ERROR › Special values › Passed through most operators  Anything == UNDEFINED is UNDEFINED › && and || eliminate if possible.  UNDEFINED && FALSE is FALSE  UNDEFINED && TRUE is UNDEFINED

65 www.cs.wisc.edu/condor 65 ClassAd Expressions: =?= and =!=  =?= and =!= are similar to == and !=  =?= tests if operands have the same type and the same value. 10 == UNDEFINED -> UNDEFINED UNDEFINED == UNDEFINED -> UNDEFINED 10 =?= UNDEFINED -> FALSE UNDEFINED =?= UNDEFINED -> TRUE  =!= inverts =?=

66 www.cs.wisc.edu/condor 66 ClassAd Functions in Condor! › Conditionals  IfThenElse(condition,then,else) › String functions  Strcat(), strcmp(), toUpper(), etc. › StringList functions  Example of a “string list” (CSV style) Mylist = “Joe, Jon, Jeff, Jim, Jake”  StrListContains(), StrListAppend(), StrListRemove(), etc. › Others  Regular expressions, arithmetic, etc…

67 www.cs.wisc.edu/condor 67 ClassAd Expressions › Further information: Section 4.1, “Condor's ClassAd Mechanism,” in the Condor Manual.

68 www.cs.wisc.edu/condor 68 Policy › Allow machine owners to specify job priorities, restrict access, and implement other local policies “Don't even think about it” by “tyger_lyllie” © http://www.flickr.com/photos/tyger_lyllie/59207292/

69 www.cs.wisc.edu/condor 69 Policy Expressions › Specified in condor_config › Policy evaluates both a machine ClassAd and a job ClassAd together  Policy can reference items in either ClassAd (See manual for list), ~140 attributes. › Can reference condor_config macros: $(MACRONAME)

70 www.cs.wisc.edu/condor 70 Machine (Startd) Policy Expression Summary › START – When is this machine willing to start a job  Typically used to restrict access when the machine is being used directly › RANK - Job preferences

71 www.cs.wisc.edu/condor 71 Machine (Startd) Policy Expression Summary › SUSPEND - When to suspend a job › CONTINUE - When to continue a suspended job › PREEMPT – When to nicely stop running a job › KILL - When to immediately kill a preempting job

72 www.cs.wisc.edu/condor 72 START › START is the primary policy › When FALSE the machine enters the Owner state and will not run jobs › Acts as the Requirements expression for the machine, the job must satisfy START  Can reference job ClassAd values including Owner and ImageSize

73 www.cs.wisc.edu/condor 73 RANK › Indicates which jobs a machine prefers  Jobs can also specify a rank › Floating point number  Larger numbers are higher ranked  Typically evaluate attributes in the Job ClassAd  Typically use + instead of &&

74 www.cs.wisc.edu/condor 74 RANK › Often used to give priority to owner of a particular group of machines › Claimed machines still advertise looking for higher ranked job to preempt the current job

75 www.cs.wisc.edu/condor 75 SUSPEND and CONTINUE › When SUSPEND becomes true, the job is suspended › When CONTINUE becomes true a suspended job is released

76 www.cs.wisc.edu/condor 76 PREEMPT and KILL › When PREEMPT becomes true, the job will be politely shut down  Vanilla universe jobs get SIGTERM Or user requested signal  Standard universe jobs checkpoint › When KILL becomes true, the job is SIGKILL  Checkpointing is aborted if started

77 www.cs.wisc.edu/condor 77 WANT_SUSPEND and WANT_VACATE › Typically leave both to TRUE › WANT_SUSPEND - If false, skip SUSPEND test, jump to PREEMPT › WANT_VACATE  If true, gives job time to vacate cleanly (until KILL becomes true)  If false, job is immediately killed ( KILL is ignored)

78 www.cs.wisc.edu/condor 78 True Road Map of the Policy Expressions START WANT SUSPEND SUSPEND Vacating PREEMPT KILL True False WANT VACATE Killing False Expression Activity

79 www.cs.wisc.edu/condor 79 Minimal Settings › Always runs jobs START = True RANK = SUSPEND = False CONTINUE = True PREEMPT = False KILL = False “Lonely at the top” by “gumuz” © http://www.flickr.com/photos/gumuz/7340411/

80 www.cs.wisc.edu/condor 80 (Boss Fat Cat) › I am adding nodes to the Cluster… but the Chemistry Department has priority on these nodes Policy Configuration

81 www.cs.wisc.edu/condor 81 New Settings for the Chemistry nodes › Prefer Chemistry jobs START = True RANK = Department == "Chemistry" SUSPEND = False CONTINUE = True PREEMPT = False KILL = False

82 www.cs.wisc.edu/condor 82 Submit file with Custom Attribute › Prefix an entry with “+” to add to job ClassAd Executable = charm-run Universe = standard +Department = Chemistry queue

83 www.cs.wisc.edu/condor 83 What if “Department” not specified? START = True RANK = Department =!= UNDEFINED && Department == "Chemistry" SUSPEND = False CONTINUE = True PREEMPT = False KILL = False

84 www.cs.wisc.edu/condor 84 More Complex RANK › Give the machine’s owners (adesmet and roy) highest priority, followed by the Chemistry department, followed by the Physics department, followed by everyone else.

85 www.cs.wisc.edu/condor 85 More Complex RANK IsOwner = (Owner == "adesmet" || Owner == "roy") IsChem =(Department =!= UNDEFINED && Department == "Chemistry") IsPhys =(Department =!= UNDEFINED && Department == "Physics") RANK = $(IsOwner)*20 + $(IsChem)*10 + $(IsPhys)

86 www.cs.wisc.edu/condor 86 › Cluster is okay, but... Condor can only use the desktops when they would otherwise be idle Policy Configuration (Boss Fat Cat)

87 www.cs.wisc.edu/condor 87 Defining Idle › One possible definition:  No keyboard or mouse activity for 5 minutes  Load average below 0.3

88 www.cs.wisc.edu/condor 88 Desktops should › START jobs when the machine becomes idle › SUSPEND jobs as soon as activity is detected › PREEMPT jobs if the activity continues for 5 minutes or more › KILL jobs if they take more than 5 minutes to preempt

89 www.cs.wisc.edu/condor 89 Macros in the Config File NonCondorLoadAvg = (LoadAvg - CondorLoadAvg) BgndLoad = 0.3 CPU_Busy = ($(NonCondorLoadAvg) >= $(BgndLoad)) CPU_Idle = ($(NonCondorLoadAvg) < $(BgndLoad)) KeyboardBusy = (KeyboardIdle < 10) MachineBusy = ($(CPU_Busy) || $(KeyboardBusy)) ActivityTimer = \ (CurrentTime - EnteredCurrentActivity)

90 www.cs.wisc.edu/condor 90 Desktop Machine Policy START = $(CPU_Idle) && KeyboardIdle > 300 SUSPEND= $(MachineBusy) CONTINUE = $(CPU_Idle) && KeyboardIdle > 120 PREEMPT= (Activity == "Suspended") && \ $(ActivityTimer) > 300 KILL = $(ActivityTimer) > 300

91 www.cs.wisc.edu/condor 91 Mission Accomplished › Our policy is in place and everyone is happy “Autumn and Blue Eyes” by “PJLewis” © http://www.flickr.com/photos/pjlewis/46134047/

92 www.cs.wisc.edu/condor 92 Machine States PREEMPTING CLAIMED UNCLAIMED OWNER MATCHED begin

93 www.cs.wisc.edu/condor 93 Machine Activities PREEMPTING CLAIMED UNCLAIMED OWNER MATCHED Benchmarking begin Idle Suspended Busy Idle Killing Vacating

94 www.cs.wisc.edu/condor 94 Machine Activities PREEMPTING CLAIMED UNCLAIMED OWNER MATCHED Benchmarking begin Idle Suspended Busy Idle Killing Vacating Idle Ignores new Backfill state! See the manual for the gory details (Section 3.6: Startd Policy Configuration)

95 www.cs.wisc.edu/condor 95 Custom Machine Attributes › Custom attributes added to job ClassAds via ‘+’ in job submit file › Can add attributes to a machine’s ClassAd, typically done in the local config file INSTRUCTIONAL=TRUE NETWORK_SPEED=100 STARTD_EXPRS=INSTRUCTIONAL,NETWORK_SPEED

96 www.cs.wisc.edu/condor 96 Custom Machine Attributes › Jobs can now specify Rank and Requirements using new attributes: Requirements = (INSTRUCTIONAL=?=UNDEFINED || INSTRUCTIONAL==FALSE) Rank = NETWORK_SPEED

97 www.cs.wisc.edu/condor 97 Policy Review › Users submitting jobs can specify Requirements and Rank expressions › Administrators can specify Startd policy expressions individually for each machine › Custom attributes easily added › You can enforce almost any policy!

98 www.cs.wisc.edu/condor 98 Further Machine Policy Information › For further information, see section 3.6 “Startd Policy Configuration” in the Condor manual › condor-users mailing list http://www.cs.wisc.edu/condor/mail-lists/ › condor-admin@cs.wisc.edu

99 www.cs.wisc.edu/condor 99 Priorities “IMG_2476” by “Joanne and Matt” © http://www.flickr.com/photos/joanne_matt/97737986/

100 www.cs.wisc.edu/condor 100 Job Priority › Set with condor_prio › Integers, larger numbers are higher priority › Only impacts order between jobs for a single user on a single schedd

101 www.cs.wisc.edu/condor 101 User Priority › Determines allocation of machines to waiting users › View with condor_userprio › Inversely related to machines allocated  A user with priority of 10 will be able to claim twice as many machines as a user with priority 20

102 www.cs.wisc.edu/condor 102 User Priority › Effective User Priority is determined by multiplying two factors  Real Priority  Priority Factor

103 www.cs.wisc.edu/condor 103 Real Priority › Based on actual usage › Defaults to 0.5 › Approaches actual number of machines used over time  Configuration setting PRIORITY_HALFLIFE

104 www.cs.wisc.edu/condor 104 Priority Factor › Assigned by administrator  Set with condor_userprio › Defaults to 1 ( DEFAULT_PRIO_FACTOR ) › Nice users default to 1,000,000 ( NICE_USER_PRIO_FACTOR )  Used for true bottom feeding jobs  Add “ nice_user=true ” to your submit file

105 www.cs.wisc.edu/condor 105 Negotiator Policy Expressions › PREEMPTION_REQUIREMENTS and PREEMPTION_RANK › Evaluated when condor_negotiator considers replacing a lower priority job with a higher priority job › Completely unrelated to the PREEMPT expression

106 www.cs.wisc.edu/condor 106 PREEMPTION_REQUIREMENTS › If false will not preempt machine  Typically used to avoid pool thrashing PREEMPTION_REQUIREMENTS = \ $(StateTimer) > (1 * $(HOUR)) \ && RemoteUserPrio > SubmittorPrio * 1.2  Only replace jobs running for at least one hour and 20% lower priority

107 www.cs.wisc.edu/condor 107 PREEMPTION_RANK › Picks which already claimed machine to reclaim PREEMPTION_RANK = \ (RemoteUserPrio * 1000000)\ - ImageSize  Strongly prefers preempting jobs with a large (bad) priority and a small image size

108 www.cs.wisc.edu/condor 108 Tools “Tools” by “book_slut73” © http://www.flickr.com/photos/book_slut/87022458/

109 www.cs.wisc.edu/condor 109 condor_config_val › Find current configuration values % condor_config_val MASTER_LOG /var/condor/logs/MasterLog

110 www.cs.wisc.edu/condor 110 condor_config_val -v › Can identify source % condor_config_val –v CONDOR_HOST CONDOR_HOST: condor.cs.wisc.edu Defined in ‘/etc/condor_config.hosts’, line 6

111 www.cs.wisc.edu/condor 111 condor_fetchlog › Retrieve logs remotely condor_fetchlog beak.cs.wisc.edu Master

112 www.cs.wisc.edu/condor 112 Querying daemons condor_status › Queries the collector for information about daemons in your pool › Defaults to finding condor_startd s › condor_status –schedd summarizes all job queues › condor_status –master returns list of all condor_master s

113 www.cs.wisc.edu/condor 113 condor_status › -long displays the full ClassAd › Specify a machine name to limit results to a single host condor_status –l node4.cs.wisc.edu

114 www.cs.wisc.edu/condor 114 condor_status -constraint › Only return ClassAds that match an expression you specify › Show me idle machines with 1GB or more memory  condor_status -constraint 'Memory >= 1024 && Activity == "Idle"‘

115 www.cs.wisc.edu/condor 115 condor_status -format › Controls format of output › Useful for writing scripts › Uses C printf style formats  One field per argument

116 www.cs.wisc.edu/condor 116 condor_status -format › Census of systems in your pool: % condor_status -format '%s ' Arch -format '%s\n' OpSys | sort | uniq –c 797 INTEL LINUX 118 INTEL WINNT50 108 SUN4u SOLARIS28 6 SUN4x SOLARIS28

117 www.cs.wisc.edu/condor 117 Examining Queues condor_q › View the job queue › The “ -long ” option is useful to see the entire ClassAd for a given job › supports –constraint and -format › Can view job queues on remote machines with the “ -name ” option

118 www.cs.wisc.edu/condor 118 condor_q -format › Census of jobs per user % condor_q -format '%8s ' Owner -format '%s\n' Cmd | sort | uniq –c 64 adesmet /scratch/submit/a.out 2 adesmet /home/bin/run_events 4 smith /nfs/sim1/em2d3d 4 smith /nfs/sim2/em2d3d

119 www.cs.wisc.edu/condor 119 condor_q -analyze › condor_q will try to figure out why the job isn’t running › Good at determining that no machine matches the job Requirements expressions

120 www.cs.wisc.edu/condor 120 condor_q -analyze › Typical results: 471216.000: Run analysis summary. Of 820 machines, 458 are rejected by your job's requirements 25 reject your job because of their own requirements 0 match, but are serving users with a better priority in the pool 4 match, but reject the job for unknown reasons 6 match, but will not currently preempt their existing job 327 are available to run your job

121 www.cs.wisc.edu/condor 121 condor_q –better-analyze › In current 6.7 releases › Breaks down the job’s requirements and suggests modifications › Very slow.

122 www.cs.wisc.edu/condor 122 condor_q –better-analyze › (Heavily truncated output) The Requirements expression for your job is: ( ( target.Arch == "SUN4u" ) && ( target.OpSys == "WINNT50" ) && [snip] Condition Machines Suggestion 1 (target.Disk > 100000000) 0 MODIFY TO 14223201 2 (target.Memory > 10000) 0 MODIFY TO 2047 3 (target.Arch == "SUN4u") 106 4 (target.OpSys == "WINNT50") 110 MOD TO "SOLARIS28" Conflicts: conditions: 3, 4

123 www.cs.wisc.edu/condor 123 Condor’s Log Files › Condor maintains one log file per daemon “Ready for the Winter” by “bcmom” © http://www.flickr.com/photos/bcmom/59207805/

124 www.cs.wisc.edu/condor 124 Condor’s Log Files › Can increase verbosity of logs on a per daemon basis  SHADOW_DEBUG, SCHEDD_DEBUG, and others  Space separated list

125 www.cs.wisc.edu/condor 125 Useful Debug Levels › D_FULLDEBUG dramatically increases information logged › D_COMMAND adds information about about commands received SHADOW_DEBUG = \ D_FULLDEBUG D_COMMAND

126 www.cs.wisc.edu/condor 126 Condor’s Log Files › Log files are automatically rolled over when a size limit is reached  Defaults to 1,000,000 bytes Old versions were 64,000 bytes, may want to increase  Rolls over quickly with D_FULLDEBUG  MAX_*_LOG, one setting per daemon MAX_SHADOW_LOG, MAX_SCHEDD_LOG, and others

127 www.cs.wisc.edu/condor 127 Condor’s Log Files › Many log files entries primarily useful to Condor developers  Especially if D_FULLDEBUG is on  Minor errors are often logged but corrected  Take them with a grain of salt  condor-admin@cs.wisc.edu

128 www.cs.wisc.edu/condor 128 Debugging Jobs “Wanna buy a Beetle?” by “Kevin” © http://www.flickr.com/photos/kevincollins/89538633/

129 www.cs.wisc.edu/condor 129 Debugging Jobs: condor_q › Examine the job with condor_q  especially -long and –analyze  Compare with condor_status –long for a machine you expected to match

130 www.cs.wisc.edu/condor 130 Debugging Jobs: User Log › Examine the job’s user log  Can find with: condor_q -format '%s\n' UserLog 17.0  Set with “log” in the submit file › Contains the life history of the job › Often contains details on problems

131 www.cs.wisc.edu/condor 131 Debugging Jobs: ShadowLog › Examine ShadowLog on the submit machine  Note any machines the job tried to execute on  There is often an “ERROR” entry that can give a good indication of what failed

132 www.cs.wisc.edu/condor 132 Debugging Jobs: Matching Problems › No ShadowLog entries? Possible problem matching the job.  Examine ScheddLog on the submit machine  Examine NegotiatorLog on the central manager

133 www.cs.wisc.edu/condor 133 Debugging Jobs: Local Problems › ShadowLog entries suggest an error but aren’t specific?  Examine StartLog and StarterLog on the execute machine

134 www.cs.wisc.edu/condor 134 Debugging Jobs: Reading Log Files › Condor logs will note the job ID each entry is for  Useful if multiple jobs are being processed simultaneously  grepping for the job ID will make it easy to find relevant entries

135 www.cs.wisc.edu/condor 135 Debugging Jobs: What Next? › If necessary add “ D_FULLDEBUG D_COMMAND ” to DEBUG_DAEMONNAME setting for additional log information › Increase MAX_DAEMONNAME_LOG if logs are rolling over too quickly › If all else fails, email us  condor-admin@cs.wisc.edu

136 www.cs.wisc.edu/condor 136 Security “Padlock” by “Peter Ford” © http://www.flickr.com/photos/peterf/72583027/

137 www.cs.wisc.edu/condor 137 Default Security › Out of the box  Relatively open, hostname and IP address based  Limit to your local IP addresses or get behind a firewall! “Mining museum door” by “byrdiegyrl” © http://www.flickr.com/photos/byrdiegyrl/39260431/

138 www.cs.wisc.edu/condor 138 Strong Security Capabilities › Strong authentication of users and other daemons › Encryption of transmitted data › Integrity validation of transmitted data Untitled, by Brian De Smet, © 2005 http://www.fief.org/sysadmin/blosxom.cgi/2005/07/22

139 www.cs.wisc.edu/condor 139 Security Options › Heavily configurable  Many authentication systems supported –GSI (Globus), Kerberos, Windows NT authentication, public key, shared password, filesystem, and more  High granularity Per access level: read, write, daemon-to-daemon, administrator, machine owner, etc. Per daemon

140 www.cs.wisc.edu/condor 140 Problems With Default Installation › Host-based granularity is too big  Any user who can login to central manager has “Administrator” privileges  Any user on a machine can evict the job on that machine via condor_vacate

141 www.cs.wisc.edu/condor 141 Problems With Default Installation › Most connections are NOT authenticated  Queue management commands are.  But other daemon-to-daemon commands are not.  It is possible to send false information to the collector and other denials of service

142 www.cs.wisc.edu/condor 142 Problems With Default Installation › Traffic is not private (encrypted) or checked for integrity  Possibility of someone eavesdropping on your traffic, including files transferred to or from execute machine  Possibility of someone modifying your traffic without detection

143 www.cs.wisc.edu/condor 143 Security Policy Configuration › Condor provides many mechanisms to address the previous shortcomings:  Many authentication methods  Strong encryption  Signed checksums for integrity

144 www.cs.wisc.edu/condor 144 Security Policy Configuration › Condor will negotiate security requirements and supported methods client server I want to submit a job You must authenticate w/ kerberos KERBEROS normal submit protocol

145 www.cs.wisc.edu/condor 145 Security Policy Configuration Default Policy SEC_DEFAULT_ENCRYPTION = OPTIONAL SEC_DEFAULT_INTEGRITY = OPTIONAL SEC_DEFAULT_AUTHENTICATION = OPTIONAL SEC_DEFAULT_AUTHENTICATION_METHODS = FS, GSI, KERBEROS, SSL, PASSWORD#UNIX SEC_DEFAULT_AUTHENTICATION_METHODS = NTSSPI, KERBEROS, SSL, PASSWORD#WIN32

146 www.cs.wisc.edu/condor 146 Security Policy Configuration Default Policy Possible Policy Values NEVERdo not allow this to happen OPTIONALdo not request it, but allow it PREFFEREDrequest it, but do not require it REQUIREDthis is mandatory SEC_DEFAULT_ENCRYPTION = OPTIONAL SEC_DEFAULT_INTEGRITY = OPTIONAL SEC_DEFAULT_AUTHENTICATION = OPTIONAL SEC_DEFAULT_AUTHENTICATION_METHODS = FS, GSI, KERBEROS, SSL, PASSWORD#UNIX SEC_DEFAULT_AUTHENTICATION_METHODS = NTSSPI, KERBEROS, SSL, PASSWORD#WIN32

147 www.cs.wisc.edu/condor 147 Security Policy Configuration YYYX YYYN YYNN XNNN R P O N Required Preferred Optional Never Client Policy Server Policy Policy Reconciliation

148 www.cs.wisc.edu/condor 148 Security Policy Configuration Very Secure Policy SEC_DEFAULT_ENCRYPTION = REQUIRED SEC_DEFAULT_INTEGRITY = REQUIRED SEC_DEFAULT_AUTHENTICATION = REQUIRED SEC_DEFAULT_AUTHENTICATION_METHODS = SSL

149 www.cs.wisc.edu/condor 149 Security Policy Configuration Policy Reconciliation Example 1 CLIENT POLICY SEC_DEFAULT_ENCRYPTION = OPTIONAL SEC_DEFAULT_INTEGRITY = OPTIONAL SEC_DEFAULT_AUTHENTICATION = OPTIONAL SEC_DEFAULT_AUTHENTICATION_METHODS = FS, GSI, KERBEROS, SSL, PASSWORD SERVER POLICY SEC_DEFAULT_ENCRYPTION = REQUIRED SEC_DEFAULT_INTEGRITY = REQUIRED SEC_DEFAULT_AUTHENTICATION = REQUIRED SEC_DEFAULT_AUTHENTICATION_METHODS = SSL RECONCILED POLICY ENCRYPTION = YES INTEGRITY = YES AUTHENTICATION = YES METHODS = SSL

150 www.cs.wisc.edu/condor 150 Security Policy Configuration Policy Reconciliation Example 2 CLIENT POLICY SEC_DEFAULT_ENCRYPTION = PREFERRED SEC_DEFAULT_INTEGRITY = OPTIONAL SEC_DEFAULT_AUTHENTICATION = REQUIRED SEC_DEFAULT_AUTHENTICATION_METHODS = SSL, GSI, KERBEROS, PASSWORD SERVER POLICY SEC_DEFAULT_ENCRYPTION = NEVER SEC_DEFAULT_INTEGRITY = PREFERRED SEC_DEFAULT_AUTHENTICATION = OPTIONAL SEC_DEFAULT_AUTHENTICATION_METHODS = FS,GSI,SSL RECONCILED POLICY ENCRYPTION = NO INTEGRITY = YES AUTHENTICATION = YES METHODS = GSI,SSL

151 www.cs.wisc.edu/condor 151 Security Policy Configuration › As you can see, you may specify a different policy for each authorization level Another Example Policy SEC_DEFAULT_ENCRYPTION = OPTIONAL SEC_DEFAULT_INTEGRITY = PREFERRED SEC_DEFAULT_AUTHENTICATION = REQUIRED SEC_DEFAULT_AUTHENTICATION_METHODS = FS, GSI, KERBEROS, PASSWORD, SSL SEC_READ_INTEGRITY = OPTIONAL SEC_READ_AUTHENTICATION = OPTIONAL SEC_CLIENT_INTEGRITY = OPTIONAL SEC_CLIENT_AUTHENTICATION = OPTIONAL

152 www.cs.wisc.edu/condor 152 Security Policy Configuration › Possible values for authorization levels:  READ  WRITE  CONFIG  ADMINISTRATOR  OWNER  DAEMON  NEGOTIATOR  SOAP › Special values:  CLIENT, DEFAULT

153 www.cs.wisc.edu/condor 153 Security Policy Configuration There is now a hierarchy of authorization: READ WRITE DAEMONADMINISTRATOR CONFIG OWNER SOAP NEGOTIATOR

154 www.cs.wisc.edu/condor 154 Security Policy Configuration Once you have authenticated users, you may use a more fine-grained authorization list: ALLOW_WRITE = zmiller@cs.wisc.edu ALLOW_WRITE = zmiller@cs.wisc.edu/goose.cs.wisc.edu ALLOW_WRITE = zmiller@cs.wisc.edu/*.cs.wisc.edu

155 www.cs.wisc.edu/condor 155 Security Policy Configuration › Format of canonical username: user@domain/host › One wildcard allowed in the user@domain portion, and one allowed in the host portion

156 www.cs.wisc.edu/condor 156 Security Policy Configuration › Usernames can now be mapped with regular expressions. Old GSI mapfile: “/C=US/ST=Wisconsin/L=Madison/O=University of Wisconsin -- Madison/O=Computer Sciences Department/OU=Condor Project/CN=Zach Miller/Email=zmiller@cs.wisc.edu” zmiller@cs.wisc.edu “/C=US/ST=Wisconsin/L=Madison/O=University of Wisconsin -- Madison/O=Computer Sciences Department/OU=Condor Project/CN=Todd Tannenbaum/Email=tannenba@cs.wisc.edu” tannenba@cs.wisc.edu Etc.

157 www.cs.wisc.edu/condor 157 Security Policy Configuration › New mapfile applies to all types of authentication › In your condor_config: SEC_CANONICAL_MAPFILE = /path/to/mapfile › Each line is a mapping rule

158 www.cs.wisc.edu/condor 158 Security Policy Configuration › Sample map file: CLAIMTOBE.*nobody FS(.*)\1 GSIEmail=(.*)\1

159 www.cs.wisc.edu/condor 159 Security Policy Configuration › Review:  The security policy determines which types of security mechanism will be used for a given type of command: SEC_ADMINISTRATOR_AUTHENTICATION = REQUIRED SEC_ADMINISTRATOR_AUTHENTICATION_METHODS = GSI

160 www.cs.wisc.edu/condor 160 Security Policy Configuration › Review:  The map file gives you a canonical name from the authenticated user: GSIEmail=(.*)\1 “/C=US/ST=Wisconsin/L=Madison/O=University of Wisconsin – Madison/O=Computer Sciences Department/OU=Condor Project /CN=Zach Miller/Email=zmiller@cs.wisc.edu” zmiller@cs.wisc.edu

161 www.cs.wisc.edu/condor 161 Security Policy Configuration › Review:  The ALLOW_ and DENY_ lists control the authorization of the canonical users: ALLOW_WRITE = *@cs.wisc.edu, zach@otherdomain.com DENY_WRITE = *.evildomain.net

162 www.cs.wisc.edu/condor 162 Configuring Authentication Methods CLAIMTOBE is not configurable, and is meant for testing purposes only. ANONYMOUS is also not configurable, and always sends the username “CONDOR_ANONYMOUS”

163 www.cs.wisc.edu/condor 163 Configuring Authentication Methods FS works by creating a file in /tmp, and checking to see who the owner of that file is. FS_REMOTE is similar, but allows you to specify where to create the file, allowing you to use a directory on NFS or some other shared storage. This is specified like so: FS_REMOTE_DIR = /shared/temp/space

164 www.cs.wisc.edu/condor 164 Configuring Authentication Methods NTSSPI uses Windows’ SSPI authentication. This requires that the username and password be the same on two machines that are authenticating to each other. There are no configuration options for this method.

165 www.cs.wisc.edu/condor 165 Configuring Authentication Methods GSI is the Globus Toolkit’s implementation of X.509. There are quite a number of configuration options for this which essentially specify which credentials to use, which to expect, and who is allowed to sign the credentials.

166 www.cs.wisc.edu/condor 166 Configuring Authentication Methods When using GSI, a user typically has a public key (certificate) and a private key that is encrypted with a password. A temporary certificate, called a proxy, is created for use with condor: % grid-proxy-init -cert zmiller.crt -key zmiller.key Your identity: /C=US/ST=Wisconsin/L=Madison/O=University of Wisconsin -- Madison/O=Computer Sciences Department/OU=Condor Project/CN=Zach Miller/Email=zmiller@cs.wisc.edu Enter GRID pass phrase for this identity: Creating proxy............................................... Done

167 www.cs.wisc.edu/condor 167 Configuring Authentication Methods For the Condor daemons to use GSI authentication, however, they will need either a proxy or a private key with no password. Proxies would need to be valid for a considerable length of time to be useful in this context, so usually a cert/keypair is used.

168 www.cs.wisc.edu/condor 168 Configuring Authentication Methods GSI_DAEMON_CERT = condor_cred.crt GSI_DAEMON_KEY = condor_cred.key Also, Condor needs to know the public key of any CA that signs certificates: GSI_DAEMON_TRUSTED_CA_DIR=/path/to/keys

169 www.cs.wisc.edu/condor 169 Configuring Authentication Methods Typically, there is a “host credential” which is owned by root in: /etc/grid-security/host.crt /etc/grid-security/host.key And also a directory of signing keys in /etc/grid-security/certificates

170 www.cs.wisc.edu/condor 170 Configuring Authentication Methods Condor will use those defaults without any additional configuration, if it has root privilege to read the host key. Alternatively, you can specify your own directory to hold the key files and the signing certificates by using: GSI_DAEMON_DIRECTORY=~/globus/

171 www.cs.wisc.edu/condor 171 Configuring Authentication Methods Since GSI is currently available only on UNIX systems, we have also added support for OpenSSL. It needs essentially the same information as GSI, which is specified in this case with a different set of configuration parameters: AUTH_SSL_SERVER_CAFILE AUTH_SSL_SERVER_CADIR AUTH_SSL_SERVER_CERTFILE AUTH_SSL_SERVER_KEYFILE AUTH_SSL_CLIENT_CAFILE AUTH_SSL_CLIENT_CADIR AUTH_SSL_CLIENT_CERTFILE AUTH_SSL_CLIENT_KEYFILE

172 www.cs.wisc.edu/condor 172 Configuring Authentication Methods For both GSI and OpenSSL, you can use the openssl command line tool to create all the necessary keys and files. For more details: › On Condor web site: http://tinyurl.com/h452l › Forthcoming in the Condor Manual… › Google.

173 www.cs.wisc.edu/condor 173 Configuring Authentication Methods KERBEROS is now available on both UNIX and Windows platforms On UNIX, it authenticates against a KDC (Kerberos Domain Controller). On Windows, the Active Directory server fills this role.

174 www.cs.wisc.edu/condor 174 Configuring Authentication Methods For Kerberos, a user may need to run kinit if that is not part of the normal login process: % kinit zmiller@CS.WISC.EDU Password for zmiller@CS.WISC.EDU: % klist Ticket cache: FILE:/var/adm/krb5/tmp/tkt/tkt_24842_pid14957 Default principal: zmiller@CS.WISC.EDU Valid starting Expires Service principal 04/24/06 12:01:07 05/24/06 12:01:07krbtgt/CS.WISC.EDU@CS.WISC.EDU

175 www.cs.wisc.edu/condor 175 Configuring Authentication Methods The Condor daemons can use the “host credential” if they are running as root. The default keytab file is typically in /etc/v5srvtab Alternatively, you can specify your own keytab file and server principal to use: KERBEROS_SERVER_KEYTAB = /scratch/zmiller/keytab.zmiller KERBEROS_SERVER_PRINCIPAL = zmiller/condor@CS.WISC.EDU

176 www.cs.wisc.edu/condor 176 Configuring Authentication Methods PASSWORD authentication also works on both UNIX and Windows, although the interface is somewhat different. On Windows, the password is stored in a secure part of the registry. On UNIX, the password is stored in a file specified by: SEC_PASSWORD_FILE = /path/to/file

177 www.cs.wisc.edu/condor 177 Configuring Authentication Methods To set the pool password, use the condor_store_cred command: condor_store_cred –c add

178 www.cs.wisc.edu/condor 178 PASSWORD protect pools › PASSWORD authentication currently only for daemons to authenticate with each other as id “condor_pool@UID_DOMAIN” SEC_PASSWORD_FILE = /var/condor/.pool_password SEC_DEFAULT_AUTHENTICATION = REQUIRED SEC_READ_AUTHENTICATION = OPTIONAL SEC_DEFAULT_AUTHENTICATION_METHODS = FS, PASSWORD ALLOW_CONFIG = root@mydomain/$(IP_ADDRESS) ALLOW_DAEMON = condor_pool@mydomain/*, condor@mydomain/$(IP_ADDRESS) ALLOW_NEGOTIATOR = condor_pool@mydomain/$(CONDOR_HOST)

179 www.cs.wisc.edu/condor 179 Security Conclusions Now you have seen a general overview of the different methods that Condor supports. It is very flexible, but the downside is complexity. Our plan: “recipes” in the manual and automation in the installation programs.

180 Condor on Windows

181 www.cs.wisc.edu/condor 181 Overview › Latest features › Running jobs as submitting user › Cross-platform authentication methods (Kerberos, SSL, Password) › Running condor in an unprivileged account

182 www.cs.wisc.edu/condor 182 Running Jobs as the Submitting User

183 www.cs.wisc.edu/condor 183 condor_store_cred add C:\>condor_store_cred add Account: gquinn@CROW Enter password: Operation succeeded. › Contacts local schedd and asks it to securely store a user’s password › Password is placed encrypted in a registry location y0urs myp4sswd

184 www.cs.wisc.edu/condor 184 condor_store_cred query C:\>condor_store_cred query Account: gquinn@CROW A credential is stored and is valid. › Checks if password is stored for your user name › Also makes sure password is up to date (by making sure it can be used to log in)

185 www.cs.wisc.edu/condor 185 condor_store_cred delete C:\>condor_store_cred delete Account: gquinn@CROW Enter password: Operation succeeded. › Removes password from secure password store

186 www.cs.wisc.edu/condor 186 Job Execution: Submit Side schedd submit shadow y0urs myp4sswd Secure Password Store

187 www.cs.wisc.edu/condor 187 Job Execution: Execute Side starter condor_exec.exe Jobs run using a Condor-specific account with minimal privileges. condor-reuse-vm1

188 www.cs.wisc.edu/condor 188 Job Execution: Execute Side starter condor_exec.exe y0urs myp4sswd schedd VM1_USER = CROW\gquinn VM2_USER = CROW\gquinn

189 www.cs.wisc.edu/condor 189 It’d be nice if… › My jobs could access my files just like the condor_shadow can › I didn’t have to tie my execute machines to a single account › I didn’t have to run condor_store_cred from every machine where my credential is needed

190 www.cs.wisc.edu/condor 190 The Windows CredD y0urs myp4sswd C:\>condor_store_cred add Account: gquinn@CROW Enter password: Operation succeeded. credd › A centralized repository for user passwords “store password”

191 www.cs.wisc.edu/condor 191 The Windows CredD y0urs myp4sswd schedd shadow Submit machines can use the CredD to impersonate the user in the shadow “fetch password”

192 www.cs.wisc.edu/condor 192 The Windows CredD y0urs myp4sswd starter condor_exec.exe Execute machines can use the CredD to run jobs as the submitting user! “fetch password”

193 www.cs.wisc.edu/condor 193 Running Jobs as Submitting User universe = vanilla executable = whoami.exe log = whoami.log output = whoami.out run_as_owner = true queue › Example submit file:

194 www.cs.wisc.edu/condor 194 Running Jobs as Submitting User CREDD_HOST = vault.cs.wisc.edu STARTER_ALLOW_RUNAS_OWNER = True CREDD_CACHE_LOCALLY = True SEC_CLIENT_AUTHENTICATION_METHODS = \ NTSSPI, PASSWORD › In config file on submit and execute nodes:

195 www.cs.wisc.edu/condor 195 Running Jobs as Submitting User › See example config file included with Condor: condor_config.local.credd # Set security settings so that full security to the credd is required CREDD.SEC_DEFAULT_AUTHENTICATION =REQUIRED CREDD.SEC_DEFAULT_ENCRYPTION = REQUIRED CREDD.SEC_DEFAULT_INTEGRITY = REQUIRED CREDD.SEC_DEFAULT_NEGOTIATION = REQUIRED # Require PASSWORD auth for password fetching CREDD.SEC_DAEMON_AUTHENTICATION_METHODS = PASSWORD # Only honor password fetch requests to the trusted "condor_pool" user CREDD.ALLOW_DAEMON = condor_pool@($UID_DOMAIN)

196 www.cs.wisc.edu/condor 196 Securing the CredD y0urs myp4sswd C:\>condor_store_cred add Account: gquinn@CROW Enter password: Operation succeeded. credd › NTSSPI can be used to authenticate to CredD and send the password encrypted over the network “store password”

197 www.cs.wisc.edu/condor 197 Securing the CredD y0urs myp4sswd starter condor_exec.exe “fetch password” Condor normally runs as SYSTEM, and therefore can’t use NTSSPI

198 www.cs.wisc.edu/condor 198 Securing the CredD › Options for securing password fetch operations Kerberos / SSL authentication Password authentication Run the Condor service as a normal account and use NTSSPI

199 www.cs.wisc.edu/condor 199 Password Authentication

200 www.cs.wisc.edu/condor 200 Password Authentication › Mutual authentication of Condor daemons possessing a shared “pool password” › Good for small pools where more heavyweight methods aren’t desirable

201 www.cs.wisc.edu/condor 201 Password Authentication › Pool password can be stored with new “-c” argument to condor_store_cred › Can also be done remotely with “-n” argument C:\> condor_store_cred –c add C:\> condor_store_cred –n crow.cs.wisc.edu –c add

202 www.cs.wisc.edu/condor 202 Using an Unprivileged Account for Condor

203 www.cs.wisc.edu/condor 203 Personal Condor › Allows creating a 1-machine Condor pool as any user C:\> SET CONDOR_CONFIG=c:\condor\condor_config C:\> condor_master -f

204 www.cs.wisc.edu/condor 204 Unprivileged Service › Condor still runs using the Service Control Manager (SCM)

205 www.cs.wisc.edu/condor 205 More Information › Condor experts here at Condor Week › Condor Manual › Condor Web Site › Condor mailing lists, esp condor-users http://www.cs.wisc.edu/condor/mail-lists/ http://dir.gmane.org/gmane.comp.distributed.condor.user/ › Email Univ of Wisconsin Condor Team at condor-admin@cs.wisc.edu

206 www.cs.wisc.edu/condor 206 Thank You!


Download ppt "Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor Administration."

Similar presentations


Ads by Google