Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - Based on “Optimizing.

Slides:

Advertisements

Similar presentations

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University

Advertisements

Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.

Pegasus on the Virtual Grid: A Case Study of Workflow Planning over Captive Resources Yang-Suk Kee, Eun-Kyu Byun, Ewa Deelman, Kran Vahi, Jin-Soo Kim Oracle.

Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.

The ADAMANT Project: Linking Scientific Workflows and Networks “Adaptive Data-Aware Multi-Domain Application Network Topologies” Ilia Baldine, Charles.

1 Concepts of Condor and Condor-G Guy Warner. 2 Harvesting CPU time Teaching labs. + Researchers Often-idle processors!! Analyses constrained by CPU time!

Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

Ewa Deelman, Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example Nandita Mangal,

Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.

GridFlow: Workflow Management for Grid Computing Kavita Shinde.

Ewa Deelman, Optimizing for Time and Space in Distributed Scientific Workflows Ewa Deelman University.

An Astronomical Image Mosaic Service for the National Virtual Observatory / ESTO.

PRESTON SMITH ROSEN CENTER FOR ADVANCED COMPUTING PURDUE UNIVERSITY A Cost-Benefit Analysis of a Campus Computing Grid Condor Week 2011.

Managing Workflows with the Pegasus Workflow Management System

Ewa Deelman, Pegasus and DAGMan: From Concept to Execution Mapping Scientific Workflows onto the National.

CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.

Authors: Weiwei Chen, Ewa Deelman 9th International Conference on Parallel Processing and Applied Mathmatics 1.

Alain Roy Computer Sciences Department University of Wisconsin-Madison An Introduction To Condor International.

Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.

Pegasus A Framework for Workflow Planning on the Grid Ewa Deelman USC Information Sciences Institute Pegasus Acknowledgments: Carl Kesselman, Gaurang Mehta,

High Throughput Computing with Condor at Purdue XSEDE ECSS Monthly Symposium Condor.

The Grid is a complex, distributed and heterogeneous execution environment. Running applications requires the knowledge of many grid services: users need.

Large, Fast, and Out of Control: Tuning Condor for Film Production Jason A. Stowe Software Engineer Lead - Condor CORE Feature Animation.

The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.

PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.

Job Submission Condor, Globus, Java CoG Kit Young Suk Moon.

Energy Prediction for I/O Intensive Workflow Applications 1 MASc Exam Hao Yang NetSysLab The Electrical and Computer Engineering Department The University.

Combining the strengths of UMIST and The Victoria University of Manchester Utility Driven Adaptive Workflow Execution Kevin Lee School of Computer Science,

Pegasus-a framework for planning for execution in grids Ewa Deelman USC Information Sciences Institute.

Scientific Workflow Scheduling in Computational Grids Report: Wei-Cheng Lee 8th Grid Computing Conference IEEE 2007 – Planning, Reservation,

Dr. Ahmed Abdeen Hamed, Ph.D. University of Vermont, EPSCoR Research on Adaptation to Climate Change (RACC) Burlington Vermont USA MODELING THE IMPACTS.

Pegasus: Mapping Scientific Workflows onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.

Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison

Combining the strengths of UMIST and The Victoria University of Manchester Adaptive Workflow Processing and Execution in Pegasus Kevin Lee School of Computer.

Pegasus: Running Large-Scale Scientific Workflows on the TeraGrid Ewa Deelman USC Information Sciences Institute

George Goulas, Christos Gogos, Panayiotis Alefragis, Efthymios Housos Computer Systems Laboratory, Electrical & Computer Engineering Dept., University.

Condor Week 2004 The use of Condor at the CDF Analysis Farm Presented by Sfiligoi Igor on behalf of the CAF group.

GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013.

SAN DIEGO SUPERCOMPUTER CENTER Inca Control Infrastructure Shava Smallen Inca Workshop September 4, 2008.

Pegasus-a framework for planning for execution in grids Karan Vahi USC Information Sciences Institute May 5 th, 2004.

Pilot Factory using Schedd Glidein Barnett Chiu BNL

Planning Ewa Deelman USC Information Sciences Institute GriPhyN NSF Project Review January 2003 Chicago.

LQCD Workflow Project L. Piccoli October 02, 2006.

Condor Services for the Global Grid: Interoperability between OGSA and Condor Clovis Chapman 1, Paul Wilson 2, Todd Tannenbaum 3, Matthew Farrellee 3,

Funded by the NSF OCI program grants OCI and OCI Mats Rynge, Gideon Juve, Karan Vahi, Gaurang Mehta, Ewa Deelman Information Sciences Institute,

Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.

GridShell/Condor: A virtual login Shell for the NSF TeraGrid (How do you run a million jobs on the NSF TeraGrid?) The University of Texas at Austin.

© Geodise Project, University of Southampton, Workflow Support for Advanced Grid-Enabled Computing Fenglian Xu *, M.

1 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Part IV Workflow Mapping and Execution in Pegasus (Thanks.

Managing LIGO Workflows on OSG with Pegasus Karan Vahi USC Information Sciences Institute

Resource Allocation and Scheduling for Workflows Gurmeet Singh, Carl Kesselman, Ewa Deelman.

1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.

Job submission overview Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.

Instrumenting Badi Abdul-Wahid, RJ Nowling CSE Operating Systems Professor Striegel.

Lessons from LEAD/VGrADS Demo Yang-suk Kee, Carl Kesselman ISI/USC.

Honolulu - Oct 31st, 2007 Using Glideins to Maximize Scientific Output 1 IEEE NSS 2007 Making Science in the Grid World - Using Glideins to Maximize Scientific.

Pegasus WMS Extends DAGMan to the grid world

Cloudy Skies: Astronomy and Utility Computing

Dynamic Deployment of VO Specific Condor Scheduler using GT4

Outline Expand via Flocking Grid Universe in HTCondor ("Condor-G")

Monitoring HTCondor with Ganglia

US CMS Testbed.

Pegasus and Condor Gaurang Mehta, Ewa Deelman, Carl Kesselman, Karan Vahi Center For Grid Technologies USC/ISI.

Basic Grid Projects – Condor (Part I)

Condor: Firewall Mirroring

rvGAHP – Push-Based Job Submission Using Reverse SSH Connections

A General Approach to Real-time Workflow Monitoring

Condor-G Making Condor Grid Enabled

Frieda meets Pegasus-WMS

JRA 1 Progress Report ETICS 2 All-Hands Meeting

Presentation transcript:

Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - Based on “Optimizing Grid-Based Workflow Execution” Gurmeet Singh, Carl Kesselman, Ewa Deelman Submitted to HPDC-05

Condor Week 2005Optimizing Workflows on the Grid2 Introduction Use of workflows on grid is becoming widespread in scientific applications. -Astrophysics -High Energy Physics -Biology etc. Current focus is on -GUIs for composing workflows -Standardizing workflow specification languages -Mapping of tasks in the workflow for optimizing system metric -Use of some workflow execution engine to execute the workflow (DAGMan, GRMS, Triana, Webflow etc) Performance of the workflow execution engine has not received much attention

Condor Week 2005Optimizing Workflows on the Grid3 Workflow Model Parse the workflow description Create a ready list of executable tasks Start the tasks on the resources Identify resources for the tasks Select tasks from the ready list Monitor task completion Dependency analysis Update the ready list

Condor Week 2005Optimizing Workflows on the Grid4 Workflow Model The costs of workflow execution are in -Creating and maintaining a ready list -Resource matching -Dispatching jobs to resources These costs can become significant for a fine granularity workflow (the runtimes of jobs are small) due to -Large number of jobs in workflow -Dependencies between jobs -Distributed nature of resources

Condor Week 2005Optimizing Workflows on the Grid5 Condor as the Workflow Execution Engine We use Condor as the Workflow Execution Engine. Condor-Glidein is used for provisioning the execution resources ahead of time. -Resource provisioning allows for experiments to isolate and examine the workflow execution overheads Based on the workflow execution costs described earlier, the factors that affect the performance in the context of the Condor system are the following -Scheduling interval (schedd, negotiator) -Job Dispatch Rate (schedd) -Job Submission rate (DAGMan, schedd)

Condor Week 2005Optimizing Workflows on the Grid6 Montage Workflow Structure 4500 total jobs 890 jobs top level 2600 jobs second level 10 minutes 100 processors 100% efficiency

Condor Week 2005Optimizing Workflows on the Grid7 Execution Environment 100 Worker Nodes from NCSA Teragrid cluster Submit Host Condor Pool COLLECTOR NEGOTIATOR DAGMan SCHEDD Central Manager STARTD

Condor Week 2005Optimizing Workflows on the Grid8 Baseline Condor Performance

Condor Week 2005Optimizing Workflows on the Grid9 Scheduling Interval Negotiation cycle is the process of identifying resources for jobs. Interval between two successive negotiation cycles is the scheduling interval Can be controlled in variety of ways -Fixed Scheduling Interval -Starting negotiation cycle at submission of each job at a rate no greater than 20 seconds

Condor Week 2005Optimizing Workflows on the Grid10 Scheduling at Job Submission 30 seconds 5 minutes 10 minutes

Condor Week 2005Optimizing Workflows on the Grid11 Fixed Scheduling Interval 30 seconds 5 minutes 10 minutes

Condor Week 2005Optimizing Workflows on the Grid12 Effect of Scheduling interval

Condor Week 2005Optimizing Workflows on the Grid13 Job Dispatch Rate Dispatch rate is the rate at which the scheduler can start the jobs on the remote resource Throttled using the JOB_START_DELAY Default setting of 2 seconds prevents loads on the submit machine and on the scheduler Artificial delay can be expensive if workflow contains too many small jobs.

Condor Week 2005Optimizing Workflows on the Grid14 Job Dispatch Rate JSD 0 seconds 1 second 2 second

Condor Week 2005Optimizing Workflows on the Grid15 Job submission rate Rate at which DAGMan submits jobs to the Condor queue. With a faster dispatch rate, the job submission rate becomes the limiting factor. Submission rate depends on the dependencies in a workflow. Restructuring a workflow to reduce dependencies can increase submission rate.

Condor Week 2005Optimizing Workflows on the Grid16 Workflow Restructuring

Condor Week 2005Optimizing Workflows on the Grid17 DAGMan for each composite job 1 Cluster per level2 Clusters per level

Condor Week 2005Optimizing Workflows on the Grid18 Condor cluster for each composite job

Condor Week 2005Optimizing Workflows on the Grid19 Conclusion Condor is a high throughput system and the default configuration works well for long running jobs. We are interested in high performance using Condor for fine granularity workflows. It is possible to improve the performance by modifying the configuration parameters and using Condor features like clustering. 90% reduction in the workflow completion time for the Montage fine granularity workflow. The reduction possible depends on the workflow structure, granularity and number of available resources

Condor Week 2005Optimizing Workflows on the Grid20 Future Work Investigate the tradeoff between the resource requirements and the workflow completion time. Investigate the effect of granularity on the workflow performance. Read “Optimizing Grid-Based Workflow Execution” by Gurmeet Singh, Carl Kesselman, Ewa Deelman Submitted to HPDC-05 at