Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pilot Factory using Schedd Glidein Barnett Chiu BNL 10.04.07.

Similar presentations


Presentation on theme: "Pilot Factory using Schedd Glidein Barnett Chiu BNL 10.04.07."— Presentation transcript:

1 Pilot Factory using Schedd Glidein Barnett Chiu BNL 10.04.07

2 Problem to solve(1) Pilot  Probe the resource (http, environment, interpreter, other executables …etc)  Pull jobs from remote server (e.g. Panda server)  Matchmaking Group jobs in different categories E.g Production jobs, Analysis jobs (CHARMM …), Test jobs … Other criteria: Number of CPUs, RAM … etc

3 Problem to Solve (2) Current approach of pilot submissions  Local pool : Vanilla  Remote pool: Condor-G Large amounts of user jobs (production + analysis) ~ large amount of Condor-G pilot jobs ~ computational overhead on gatekeepers (e.g. large memory consumptions)

4 Solution Is there any way to bypass GRAM to submit jobs to remote machines? Local submissions, but how?  We need something that continuously submit local pilot jobs on the gatekeeper  Solution: Pilot Factory

5 Pilot Factory Overview Pilot Factory is an application that combines the following ideas:  schedd glidein  pilot submission program (or pilot generator) What is glidein?  Mini-Condor pool on a remote machine A complete Condor pool has at least 5 components: i.e. master, startd, schedd, collector, negotiator Glidein: {master, startd}, {master, schedd}, … etc  Properly configured condor daemons submitted as batch job

6 Glidein (1) Two major steps Condor-G #1: installation glidein setup script condor configuration file glidein startup script download Condor binaries (http, gsiftp …etc) Condor-G #2: execution exec glidein startup script  condor_master

7 Glidein (2) Central Manager Collector Submit Host Master schedd master schedd master startd Tarball server master startd master schedd Execute hosts … master startd master startd Glidein types ~/Condor_glidein Startup script Glidein config {master, schedd …} ?

8 Schedd Glidein Logics based on startd glidein (two Condor-G to set up ) Usage: By running glidein schedd on gatekeeper, the schedd then serves as a gateway between submit host and grid sites Mini Condor pool with schedd functionalities:  Submit host  Maintain persistent queue of jobs  Communicate with native batch system and forward user jobs Condor, PBS, LSF, …etc  Manipulate job queues through the followoing commands: condor_submit,condor_rm, condor_q, condor_hold, condor_release, condor_prio  Security Features (GSI) Who is authorized to set up Pilot Factory?

9 Schedd Glidein Example (1) Command: // schedd glidein #1 condor_glidein -count 1 -arch 6.8.1-i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork gridgk01.racf.bnl.gov/jobmanager-fork -type schedd –forcesetup Command: // schedd glidein #2 condor_glidein -count 1 -arch 6.8.1-i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork gridgk02.racf.bnl.gov/jobmanager-fork -type schedd –forcesetup Command : // schedd glidein # 3, #4, #5 condor_glidein -count 3 -arch 6.8.1-i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork nostos.cs.wisc.edu/jobmanager-fork -type schedd –forcesetup Use fork since we want schedd to be on gatekeeper!

10 Schedd Glidein Example (2) Command: condor_status -schedd Name Machine TotalRunningJobs TotalIdleJobs TotalHeldJobs agrd0926@gridgk01.ra gridgk01.r 0 0 0 agrd0926@gridgk02.ra gridgk02.r 0 0 0 pleiades@gridui01.us gridui01.u 0 0 0 pleiades@ribera.cs.w ribera.cs. 0 0 0 pleiades@ron.cs.wisc ron.cs.wis 0 0 0 pleiades@vail.cs.wis vail.cs.wi 0 0 0 TotalRunningJobs TotalIdleJobs TotalHeldJobs Total 0 0 0

11 Pilot Submission Program (Generator) Communicate with a DB server that maintains information about pilot jobs  E.g. pilot_type, pilot_queue Pulls desired pilot script from an external server Periodically submit pilot jobs (with pilot script as executable)  condor_submit  qsub? No, not necessary, since …

12 Build Pilot Factory with Glidein Schedd glidein installed and executed on the gatekeeper User submit a Condor-C job with pilot generator as the executable  Generator runs on the gatekeeper as a local universe job supervised by the glidein schedd Generator submits pilots  Types, frequency adjustable by users  Depending on the native batch system, pilots can be submitted as grid universe jobs  Along with GAHP and related binaries, schedd has the ability to communicate different batch systems master schedd JobManager LSF PBS schedd Grid Resource ~ Pilot generator

13 Pilot Factory Glidein requestSubmit Pilots Pilot Factory Gatekeeper with {Globus, Condor|PBS|…} Cluster Worker Nodes Submit Node (Collector, Master, Negotiator, Schedd) Connected to Collector master schedd ~

14 Future Work Integrating pilot with Condor startd to implement startd-based pilot  the startd-based pilot retrieves the payload of a user job in the same way as does the generic pilot but in addition, it also inherits functionalities of Condor startd.  Original intention was to run PFs with the startd-pilots on worker nodes (too greedy, unacceptable?)  Using Condor started makes it easier to integrate with gLexec Transform Generic PF (GPF) to Startd PF (SPF)

15 Reference [1] Schedd GlideinSchedd Glidein [2] Pilot FactoryPilot Factory [3] glideinWMS: An advanced applicationglideinWMS on glideins


Download ppt "Pilot Factory using Schedd Glidein Barnett Chiu BNL 10.04.07."

Similar presentations


Ads by Google