Presentation is loading. Please wait.

Presentation is loading. Please wait.

Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - Based on “Optimizing.

Similar presentations


Presentation on theme: "Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - Based on “Optimizing."— Presentation transcript:

1 Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - gmehta@isi.edugmehta@isi.edu Based on “Optimizing Grid-Based Workflow Execution” Gurmeet Singh, Carl Kesselman, Ewa Deelman Submitted to HPDC-05

2 Condor Week 2005Optimizing Workflows on the Grid2 Introduction Use of workflows on grid is becoming widespread in scientific applications. -Astrophysics -High Energy Physics -Biology etc. Current focus is on -GUIs for composing workflows -Standardizing workflow specification languages -Mapping of tasks in the workflow for optimizing system metric -Use of some workflow execution engine to execute the workflow (DAGMan, GRMS, Triana, Webflow etc) Performance of the workflow execution engine has not received much attention

3 Condor Week 2005Optimizing Workflows on the Grid3 Workflow Model Parse the workflow description Create a ready list of executable tasks Start the tasks on the resources Identify resources for the tasks Select tasks from the ready list Monitor task completion Dependency analysis Update the ready list

4 Condor Week 2005Optimizing Workflows on the Grid4 Workflow Model The costs of workflow execution are in -Creating and maintaining a ready list -Resource matching -Dispatching jobs to resources These costs can become significant for a fine granularity workflow (the runtimes of jobs are small) due to -Large number of jobs in workflow -Dependencies between jobs -Distributed nature of resources

5 Condor Week 2005Optimizing Workflows on the Grid5 Condor as the Workflow Execution Engine We use Condor as the Workflow Execution Engine. Condor-Glidein is used for provisioning the execution resources ahead of time. -Resource provisioning allows for experiments to isolate and examine the workflow execution overheads Based on the workflow execution costs described earlier, the factors that affect the performance in the context of the Condor system are the following -Scheduling interval (schedd, negotiator) -Job Dispatch Rate (schedd) -Job Submission rate (DAGMan, schedd)

6 Condor Week 2005Optimizing Workflows on the Grid6 Montage Workflow Structure 4500 total jobs 890 jobs top level 2600 jobs second level 10 minutes 100 processors 100% efficiency

7 Condor Week 2005Optimizing Workflows on the Grid7 Execution Environment 100 Worker Nodes from NCSA Teragrid cluster Submit Host Condor Pool COLLECTOR NEGOTIATOR DAGMan SCHEDD Central Manager STARTD

8 Condor Week 2005Optimizing Workflows on the Grid8 Baseline Condor Performance

9 Condor Week 2005Optimizing Workflows on the Grid9 Scheduling Interval Negotiation cycle is the process of identifying resources for jobs. Interval between two successive negotiation cycles is the scheduling interval Can be controlled in variety of ways -Fixed Scheduling Interval -Starting negotiation cycle at submission of each job at a rate no greater than 20 seconds

10 Condor Week 2005Optimizing Workflows on the Grid10 Scheduling at Job Submission 30 seconds 5 minutes 10 minutes

11 Condor Week 2005Optimizing Workflows on the Grid11 Fixed Scheduling Interval 30 seconds 5 minutes 10 minutes

12 Condor Week 2005Optimizing Workflows on the Grid12 Effect of Scheduling interval

13 Condor Week 2005Optimizing Workflows on the Grid13 Job Dispatch Rate Dispatch rate is the rate at which the scheduler can start the jobs on the remote resource Throttled using the JOB_START_DELAY Default setting of 2 seconds prevents loads on the submit machine and on the scheduler Artificial delay can be expensive if workflow contains too many small jobs.

14 Condor Week 2005Optimizing Workflows on the Grid14 Job Dispatch Rate JSD 0 seconds 1 second 2 second

15 Condor Week 2005Optimizing Workflows on the Grid15 Job submission rate Rate at which DAGMan submits jobs to the Condor queue. With a faster dispatch rate, the job submission rate becomes the limiting factor. Submission rate depends on the dependencies in a workflow. Restructuring a workflow to reduce dependencies can increase submission rate.

16 Condor Week 2005Optimizing Workflows on the Grid16 Workflow Restructuring

17 Condor Week 2005Optimizing Workflows on the Grid17 DAGMan for each composite job 1 Cluster per level2 Clusters per level

18 Condor Week 2005Optimizing Workflows on the Grid18 Condor cluster for each composite job

19 Condor Week 2005Optimizing Workflows on the Grid19 Conclusion Condor is a high throughput system and the default configuration works well for long running jobs. We are interested in high performance using Condor for fine granularity workflows. It is possible to improve the performance by modifying the configuration parameters and using Condor features like clustering. 90% reduction in the workflow completion time for the Montage fine granularity workflow. The reduction possible depends on the workflow structure, granularity and number of available resources

20 Condor Week 2005Optimizing Workflows on the Grid20 Future Work Investigate the tradeoff between the resource requirements and the workflow completion time. Investigate the effect of granularity on the workflow performance. Read “Optimizing Grid-Based Workflow Execution” by Gurmeet Singh, Carl Kesselman, Ewa Deelman Submitted to HPDC-05 at http://pegasus.isi.edu/publications.htm http://pegasus.isi.edu/publications.htm


Download ppt "Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - Based on “Optimizing."

Similar presentations


Ads by Google