Progress Report Barnett Chiu Glidein Code Updates and Tests (1) Major modifications to condor_glidein code are as follows: 1. Command Options:

Slides:



Advertisements
Similar presentations
Jaime Frey Computer Sciences Department University of Wisconsin-Madison OGF 19 Condor Software Forum Routing.
Advertisements

Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
Dealing with real resources Wednesday Afternoon, 3:00 pm Derek Weitzel OSG Campus Grids University of Nebraska.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
Dr. David Wallom Use of Condor in our Campus Grid and the University September 2004.
GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
1 Workshop 20: Teaching a Hands-on Undergraduate Grid Computing Course SIGCSE The 41st ACM Technical Symposium on Computer Science Education Friday.
GRID Workload Management System Massimo Sgaravatto INFN Padova.
Workload Management Massimo Sgaravatto INFN Padova.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
Evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova.
Hjemmeeksamen 1 INF3190. Oppgave Develop a monitoring/administration tool which allows an administrator to use a client to monitor all processes running.
Introduction to Condor DMD/DFS J.Knudstrup December 2005.
Zach Miller Computer Sciences Department University of Wisconsin-Madison What’s New in Condor.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison What’s New in Condor.
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Condor Tugba Taskaya-Temizel 6 March What is Condor Technology? Condor is a high-throughput distributed batch computing system that provides facilities.
Installing and Managing a Large Condor Pool Derek Wright Computer Sciences Department University of Wisconsin-Madison
Peter Keller Computer Sciences Department University of Wisconsin-Madison Quill Tutorial Condor Week.
Campus Grids Report OSG Area Coordinator’s Meeting Dec 15, 2010 Dan Fraser (Derek Weitzel, Brian Bockelman)
Grid Computing I CONDOR.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP th March 2003 San Diego.
GridShell + Condor How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner Edward Walker Miron Livney Todd Tannenbaum The Condor Development Team.
Part 6: (Local) Condor A: What is Condor? B: Using (Local) Condor C: Laboratory: Condor.
Intermediate Condor Rob Quick Open Science Grid HTC - Indiana University.
Grid job submission using HTCondor Andrew Lahiff.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
Dealing with real resources Wednesday Afternoon, 3:00 pm Derek Weitzel OSG Campus Grids University of Nebraska.
Report from USA Massimo Sgaravatto INFN Padova. Introduction Workload management system for productions Monte Carlo productions, data reconstructions.
The Roadmap to New Releases Derek Wright Computer Sciences Department University of Wisconsin-Madison
Tarball server (for Condor installation) Site Headnode Worker Nodes Schedd glidein - special purpose Condor pool master DB Panda Server Pilot Factory -
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Quill / Quill++ Tutorial.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Dan Bradley University of Wisconsin-Madison Condor and DISUN Teams Condor Administrator’s How-to.
HTCondor and Workflows: An Introduction HTCondor Week 2015 Kent Wenger.
Condor Project Computer Sciences Department University of Wisconsin-Madison Grids and Condor Barcelona,
Condor Week 2004 The use of Condor at the CDF Analysis Farm Presented by Sfiligoi Igor on behalf of the CAF group.
Pilot Factory using Schedd Glidein Barnett Chiu BNL
Greg Thain Computer Sciences Department University of Wisconsin-Madison Configuring Quill Condor Week.
Grid Compute Resources and Job Management. 2 Job and compute resource management This module is about running jobs on remote compute resources.
Hyperion Artifact Life Cycle Management Agenda  Overview  Demo  Tips & Tricks  Takeaways  Queries.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
The Gateway Computational Web Portal Marlon Pierce Indiana University March 15, 2002.
JSS Job Submission Service Massimo Sgaravatto INFN Padova.
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
HTCondor-CE for USATLAS Bob Ball AGLT2/University of Michigan OSG AHM March, 2015 Bob Ball AGLT2/University of Michigan OSG AHM March, 2015.
HTCondor’s Grid Universe Jaime Frey Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison.
First evaluation of the Globus GRAM service Massimo Sgaravatto INFN Padova.
HTCondor Security Basics
Operating a glideinWMS frontend by Igor Sfiligoi (UCSD)
Outline Expand via Flocking Grid Universe in HTCondor ("Condor-G")
High Availability in HTCondor
Troubleshooting Your Jobs
HTCondor Security Basics HTCondor Week, Madison 2016
Job Matching, Handling, and Other HTCondor Features
Condor Glidein: Condor Daemons On-The-Fly
Basic Grid Projects – Condor (Part I)
Initial job submission and monitoring efforts with JClarens
HTCondor Training Florentia Protopsalti IT-CM-IS 1/16/2019.
Condor: Firewall Mirroring
GRID Workload Management System for CMS fall production
Condor Administration in the Open Science Grid
Condor-G Making Condor Grid Enabled
Troubleshooting Your Jobs
Condor-G: An Update.
Presentation transcript:

Progress Report Barnett Chiu

Glidein Code Updates and Tests (1) Major modifications to condor_glidein code are as follows: 1. Command Options: 1a. an option "type" is added to select between 1a. an option "type" is added to select between schedd and startd glidein with default being startd. schedd and startd glidein with default being startd. 1b. an option “tcp” is added to force TCP connection 1b. an option “tcp” is added to force TCP connection 1c. Other options will be included for selecting gram services 1c. Other options will be included for selecting gram services and supporting batch systems such as PBS and LSF. and supporting batch systems such as PBS and LSF. 2. DAEMON_LIST: 2a. For startd-based glidein, have master spawns the startd 2a. For startd-based glidein, have master spawns the startd This is done by including master and startd in the DAEMON_LIST This is done by including master and startd in the DAEMON_LIST 2b. For schedd-based glidein, have master spawns the schedd 2b. For schedd-based glidein, have master spawns the schedd Similarly, include both master and schedd in DAEMON_LIST Similarly, include both master and schedd in DAEMON_LIST

Glidein Code Updates and Tests (2) 3. Added code to adjust $SERVER_URL based on type of glidein e.g. GLIDEIN_SERVER_URL can be set to: e.g. GLIDEIN_SERVER_URL can be set to: Roughly speaking, the way I distinguish between startd and schedd glidein is Roughly speaking, the way I distinguish between startd and schedd glidein is that at the URL for schedd-based glidein should contain schedd_based that at the URL for schedd-based glidein should contain schedd_based directory … directory …

Glidein Code Updates and Tests (3) 4. Added a function named gen_main_schedd_config () that sets up schedd-related configurations. schedd-related configurations. 5. in do_remote_setup(), use a function pointer choose to either gen_main_schedd_config () or gen_main_config(), i.e. functions that generates necessary configurations for schedd glidein and startd glidein respectively. Function pointer offers the flexibility for choosing different types of glideins. startd glidein respectively. Function pointer offers the flexibility for choosing different types of glideins. E.g. Schedd-glidein can be further categorized in terms of supporting E.g. Schedd-glidein can be further categorized in terms of supporting different batch systems it supports such as LSF, PBS or other types of different batch systems it supports such as LSF, PBS or other types of batch systems as the grid technology evolves… batch systems as the grid technology evolves… E.g. Other types of glideins as Condor evolves… E.g. Other types of glideins as Condor evolves…

Glidein Code Updates and Tests (3) Authentication Authentication When condor_submit talks to schedd, it needs to authenticate itself When condor_submit talks to schedd, it needs to authenticate itself Several authentication schemes can be chosen: FS, KERBEROS, Several authentication schemes can be chosen: FS, KERBEROS, GSI, CLAIMTOBE GSI, CLAIMTOBE Configuration Configuration SEC_DEFAULT_AUTHENTICATION = OPTIONAL (or REQUIRED) SEC_DEFAULT_AUTHENTICATION = OPTIONAL (or REQUIRED) SEC_DEFAULT_AUTHENTICATION_METHODS = FS, GSI, SEC_DEFAULT_AUTHENTICATION_METHODS = FS, GSI, KERBEROS, CLAIMTOBE KERBEROS, CLAIMTOBE Both the submit machine and the glidein configuration file have to use Both the submit machine and the glidein configuration file have to use the same settings. the same settings. For the testing phase, use CLAIMTOBE so that the schedd trusts whoever For the testing phase, use CLAIMTOBE so that the schedd trusts whoever executes condor_submit executes condor_submit

Schedd-Glidein Demo (1) Command: // schedd glidein #1 Command: // schedd glidein #1 condor_glidein -count 1 -arch i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork gridgk01.racf.bnl.gov/jobmanager-fork -type schedd –forcesetup condor_glidein -count 1 -arch i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork gridgk01.racf.bnl.gov/jobmanager-fork -type schedd –forcesetup Command: // schedd glidein #2 Command: // schedd glidein #2 condor_glidein -count 1 -arch i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork gridgk02.racf.bnl.gov/jobmanager-fork -type schedd –forcesetup condor_glidein -count 1 -arch i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork gridgk02.racf.bnl.gov/jobmanager-fork -type schedd –forcesetup Command : // schedd glidein # 3, #4, #5 Command : // schedd glidein # 3, #4, #5 condor_glidein -count 3 -arch i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork nostos.cs.wisc.edu/jobmanager-condor -type schedd –forcesetup condor_glidein -count 3 -arch i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork nostos.cs.wisc.edu/jobmanager-condor -type schedd –forcesetup

Schedd-Glidein Demo (2) Command: condor_status -schedd Name Machine TotalRunningJobs TotalIdleJobs TotalHeldJobs gridgk01.r gridgk02.r gridui01.u ribera.cs ron.cs.wis vail.cs.wi TotalRunningJobs TotalIdleJobs TotalHeldJobs TotalRunningJobs TotalIdleJobs TotalHeldJobs Total Total 0 0 0

Demo (3) Command Command condor_status -schedd -l | grep -i Name | sed -e 's/Name[ ]*=[ condor_status -schedd -l | grep -i Name | sed -e 's/Name[ ]*=[ Output Output

Demo (4) Command: condor_status -schedd -long -constraint "is_glidein=?=true" or customized command condor_schedd_ad [schedd_name] or customized command condor_schedd_ad [schedd_name] MyType = "Scheduler“ TargetType = "" IS_GLIDEIN = TRUE CondorVersion = "$CondorVersion: Sep $" CondorPlatform = "$CondorPlatform: I386-LINUX_RHEL3 $" Machine = "ron.cs.wisc.edu" QuillEnabled = FALSE ScheddIpAddr = " " MyAddress = " " NumUsers = 0 Name = VirtualMemory = 0 TotalIdleJobs = 0 TotalRunningJobs = 0

Demo (5) How to submit jobs? How to submit jobs? Command: Command: condor_submit cgtest1 -remote condor_submit cgtest1 -remote Output: Output: condor_submit cgtest1 -remote condor_submit cgtest1 -remote Submitting job(s) Submitting job(s) WARNING: Log file /direct/usatlas+u/pleiades/test/log/nostos_echo.1.0 WARNING: Log file /direct/usatlas+u/pleiades/test/log/nostos_echo.1.0 is on NFS. is on NFS. This could cause log file corruption and is _not_ recommended. This could cause log file corruption and is _not_ recommended.. Logging submit event(s). Logging submit event(s). 1 job(s) submitted to cluster 1. 1 job(s) submitted to cluster 1. Spooling data files for 1 jobs... Spooling data files for 1 jobs... In PilotFactory project, cgtest1 would be replaced by a wrapper of pilotScheduler.py and its dependent programs included in transfer_input_files, so that the job that contains pilotScheduler program (i.e. Generator) can be submitted to the glidein schedd as a Condor-C job and then runs within the schedd as a scheduler universe job. In PilotFactory project, cgtest1 would be replaced by a wrapper of pilotScheduler.py and its dependent programs included in transfer_input_files, so that the job that contains pilotScheduler program (i.e. Generator) can be submitted to the glidein schedd as a Condor-C job and then runs within the schedd as a scheduler universe job. For more information, please check GPF in Pilot Factory Proposal For more information, please check GPF in Pilot Factory ProposalPilot Factory ProposalPilot Factory Proposal

Demo (6) Command: Command: condor_q -name condor_q -name Output: Output: -- Schedd: : -- Schedd: : ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 1.0 pleiades 2/26 15: :00:00 C ps auwfx 1.0 pleiades 2/26 15: :00:00 C ps auwfx 0 jobs; 0 idle, 0 running, 0 held 0 jobs; 0 idle, 0 running, 0 held

Documentation Updating Twiki page on Schedd-based Glidein Updating Twiki page on Schedd-based Glidein Condor-G and Glidein Performance and Functionality Accessment Condor-G and Glidein Performance and Functionality Accessment

Condor Utilities (1) For condor-G general tests, it is inconvenient to recreate job submission files … For condor-G general tests, it is inconvenient to recreate job submission files … condor_gen_gridjob: a program that automatically generates the submit file with condor_gen_gridjob: a program that automatically generates the submit file with simply a command: simply a command: [comm] condor_gen_gridjob --exec $HOME/myprog [comm] condor_gen_gridjob --exec $HOME/myprog --out $HOME/condor_test/ouput --out $HOME/condor_test/ouput --in $HOME/condor_test/input … --in $HOME/condor_test/input … [other commands] condor_gen_ccjob, condor_gen_vanilla, … etc [other commands] condor_gen_ccjob, condor_gen_vanilla, … etc Checking the individual classad published by a particular *schedd* Checking the individual classad published by a particular *schedd* e.g. Use condor_status –schedd –long to check for all *schedd* classads; e.g. Use condor_status –schedd –long to check for all *schedd* classads; however, it’s not straightforward for checking the published classad assoicated however, it’s not straightforward for checking the published classad assoicated with a particular instance of *schedd*  condor_schedd_ad (done) with a particular instance of *schedd*  condor_schedd_ad (done) [comm] condor_schedd_ad [comm] condor_schedd_ad

Condor Utilities (2) List the current avaiable *schedd* and check some of the important properties List the current avaiable *schedd* and check some of the important properties [usage] condor_schedd_list [-g|-h | … ] [usage] condor_schedd_list [-g|-h | … ] [comm] condor_schedd_list –g [comm] condor_schedd_list –g [output] [output] Listing glidein *schedd*... Some options for checking individual properties of a *schedd* are under way … Some options for checking individual properties of a *schedd* are under way … e.g. Machine = "tier2-02.uchicago.edu“ e.g. Machine = "tier2-02.uchicago.edu“ ScheddIpAddr = " “ ScheddIpAddr = " “ Name = (often needs to use in combination with other Name = (often needs to use in combination with other commands, e.g. submit jobs) commands, e.g. submit jobs) DaemonStartTime = DaemonStartTime = …

Condor Utilities (3) Other utilities for debugging Other utilities for debugging condor_pid_lookup condor_pid_lookup [comm] condor_pid_lookup -c gridgk01.racf.bnl.gov [comm] condor_pid_lookup -c gridgk01.racf.bnl.gov [output] [output] USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND agrd ? S Feb27 3:06 /usatlas/grid/agrd0926/Condor_glidein/6.8.1-i686-pc-Linux-2.4/condor_master -dyn –f Or, vise versa … Or, vise versa … [comm] condor_pid_lookup -c gridgk01.racf.bnl.gov condor_master [comm] condor_pid_lookup -c gridgk01.racf.bnl.gov condor_master condor_schedd_time condor_schedd_time [ comm] condor_schedd_time [ comm] condor_schedd_time [output] Fri 23 Feb :18:31 AM EST [output] Fri 23 Feb :18:31 AM EST [usage] degugging, can be used in combination with gridmanager log file and extract the desired [usage] degugging, can be used in combination with gridmanager log file and extract the desired section of information (condor_pid_lookup + condor_schedd_time) section of information (condor_pid_lookup + condor_schedd_time)