Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dynamic Deployment of VO Specific Condor Scheduler using GT4

Similar presentations


Presentation on theme: "Dynamic Deployment of VO Specific Condor Scheduler using GT4"— Presentation transcript:

1 Dynamic Deployment of VO Specific Condor Scheduler using GT4
Gaurang Mehta - Center For Grid Technologies Information Sciences Institute/USC Condor Week 2006 Dynamic Deployment of VO Condor Scheduler

2 Dynamic Deployment of VO Condor Scheduler
Outline Introduction VO Problem Solution 1 : Glideins Solution 2 : Glideins via GCB Solution 3 : Condor Brick Conclusion Condor Week 2006 Dynamic Deployment of VO Condor Scheduler

3 Grid Job Submission Cluster Worker Nodes GT4 PBS GRAM Submit Node
PBS Jobs Gram jobs GT4 PBS GRAM Submit Node (Collector, Negotiator, Master, Schedd) Condor Week 2006 Dynamic Deployment of VO Condor Scheduler

4 Dynamic Deployment of VO Condor Scheduler
Introduction GT4 Gram allows remote job submissions to a cluster. No scheduling capabilities of its own. Condor-G can act as a meta scheduler to submit to different clusters via GRAM. Multiple translation of job description before it is executed. Condor Week 2006 Dynamic Deployment of VO Condor Scheduler

5 Dynamic Deployment of VO Condor Scheduler
VO Requirements The Southern California Earthquake Center (SCEC) Gathers new information about earthquakes in SoCal Integrations information into a comprehensive and predictive understanding of earthquake phenomena Want to run millions of earthquake analysis jobs simultaneously on 10’s of resources (1000’s of Cpus). Want a reliable performance over all their jobs Want to hold the allocated resources for a period of time and pipe as many jobs to the resources as they can. Want to run on wide number of resource configurations. Condor Week 2006 Dynamic Deployment of VO Condor Scheduler

6 Solution 1 : Glideins Cluster on a public network Cluster Worker Nodes
Connect to Collector Execute Jobs PBS runs Glidein request Glidein request Submit Node (Collector, Master, Negotiator, Schedd) GT4 PBS GRAM Cluster on a public network Condor Week 2006 Dynamic Deployment of VO Condor Scheduler

7 Dynamic Deployment of VO Condor Scheduler
Solution 1 : Glideins Glideins add resources to an existing condor setup by running condor startd daemons via GT4 GRAM on a remote scheduler and cluster. Glideins can be either setup and started by running the condor-glidein command or by writing your own condor or GRAM RSL. Pros Allows for your own personal condor cluster (VO grid) to be created Allows for VO specific policies and priorities to be applied to all the nodes in the grid. Cons Glideins are not usable when the remote resources have private IP addresses. (without additional help). Submit node becomes a bottle neck for the number of jobs that can be submitted Condor Week 2006 Dynamic Deployment of VO Condor Scheduler

8 Solution 2 : Glideins via GCB
Public Network Private Network Cluster Worker Nodes X Execute Jobs Connect to Collector PBS runs Glidein request Glidein request Submit Node (Collector, Master, Negotiator, Schedd) GT4 PBS GRAM Cluster on a private network with outgoing connection allowed Condor Week 2006 Dynamic Deployment of VO Condor Scheduler

9 Solution 2: Glideins with GCB
Sometimes clusters can be behind a firewall and have only private IP addresses. Such clusters may still allow outgoing connections from each node In such a case glideins cannot work directly. The shadow process is unable to communicate directly with the starter process on the remote node. Generic Connection Broker (GCB) sits somewhere on the public network and acts as a proxy relay. Condor Week 2006 Dynamic Deployment of VO Condor Scheduler

10 Solution 2 : Glideins via GCB
Public Network Private Network GCB Cluster Worker Nodes Open Connection to GCB Execute Jobs Connect to Collector PBS runs Glidein request Glidein request Submit Node (Collector, Master, Negotiator, Schedd) GT4 PBS GRAM Cluster on a private network with outgoing connection allowed Condor Week 2006 Dynamic Deployment of VO Condor Scheduler

11 Solution 2 : Glideins with GCB
Pros GCB allows the shadow and starter processes to communicate across network boundaries. Allows for a VO specific Grid to be created Allows for VO specific policies and priorities to be applied to the entire VO Grid Cons Additional overhead of maintaining and running a GCB proxy. Point of failure for all the jobs. Only works if the remote cluster setup allows outgoing connections to the public network. Submit Node is a bottle neck for the number of jobs that can be submitted Condor Week 2006 Dynamic Deployment of VO Condor Scheduler

12 Solution 3: Condor Brick
Some clusters with Private IP address don’t allow either incoming or outgoing connection except to the boundary servers. In such cases it is not possible to either run glideins directly or via GCB. Also both the earlier solutions don’t allow per cluster/site VO policies or priorities to be applied. (It may be possible but at least I don’t know) Welcome Condor Brick Condor Week 2006 Dynamic Deployment of VO Condor Scheduler

13 Solution 3 : Condor Brick
Public Network Private Network GCB Cluster Worker Nodes X Connect to Collector PBS runs Glidein request Glidein request GT4 PBS GRAM Submit Node (Collector, Master, Negotiator, Schedd) Cluster on a private network with outgoing connection allowed Condor Week 2006 Dynamic Deployment of VO Condor Scheduler

14 Condor Brick Condor Master Condor Collector Condor Negotiator
Condor Schedd VO specific Policies and Priorities (Condor Config) Private Network Public Network Condor Week 2006 Dynamic Deployment of VO Condor Scheduler

15 Solution 3: Condor Brick
Bunch of condor-daemons and configuration files Condor Bricks bind to all the interfaces on the remote server Sits on the boundary and talks to both the public and private networks Dynamically deploy the brick on remote clusters using a GT4 GRAM fork job-manager on any boundary machine when needed. Dynamically glide in required cluster nodes to the brick using a GT4 <sched> job-manager. (schedd - startd communication over LAN) Use Condor-C to submit jobs to remote Condor Bricks. (schedd – schedd communication over WAN) Condor Week 2006 Dynamic Deployment of VO Condor Scheduler

16 Solution 3 : Condor Brick
Public Network Private Network GT4 Fork GRAM Cluster Worker Nodes CONDOR BRICK Execute Jobs Deploy Brick Connect to Brick Execute Jobs via Condor-C PBS runs Glidein request Glidein request Submit Node (Collector, Negotiator, Master, Schedd) GT4 PBS GRAM Cluster on a private network with no incoming or outgoing connections allowed Condor Week 2006 Dynamic Deployment of VO Condor Scheduler

17 Solution 3: Condor Brick
A condor brick at each cluster ensures job load is distributed from the submit node. Condor brick at each cluster enables VO specific policies and priorities to be implemented at a per cluster/site level. Uniform scheduling system from submit node to the cluster. High throughput of jobs due to reduction in grid-overhead in each job submission. Condor bricks bind on all interfaces making this the most generic of the three solutions. Condor Week 2006 Dynamic Deployment of VO Condor Scheduler

18 Dynamic Deployment of VO Condor Scheduler
Conclusion Deploying a dynamic Condor scheduler for the VO on each cluster results in a robust, uniform scheduling environment. A VO can control the policies and priorities for all their members on the entire VO grid or on each Cluster in the VO grid. Condor brick eliminates some of the grid latency. Condor brick allows using clusters with different network + firewall configurations. Condor Week 2006 Dynamic Deployment of VO Condor Scheduler

19 Dynamic Deployment of VO Condor Scheduler
Questions? Thanks to Miron and the Condor team. Condor Week 2006 Dynamic Deployment of VO Condor Scheduler


Download ppt "Dynamic Deployment of VO Specific Condor Scheduler using GT4"

Similar presentations


Ads by Google