Ewa Deelman, Optimizing for Time and Space in Distributed Scientific Workflows Ewa Deelman University.

Slides:



Advertisements
Similar presentations
Pegasus on the Virtual Grid: A Case Study of Workflow Planning over Captive Resources Yang-Suk Kee, Eun-Kyu Byun, Ewa Deelman, Kran Vahi, Jin-Soo Kim Oracle.
Advertisements

Managing Workflows Within HUBzero: How to Use Pegasus to Execute Computational Pipelines Ewa Deelman USC Information Sciences Institute Acknowledgement:
The ADAMANT Project: Linking Scientific Workflows and Networks “Adaptive Data-Aware Multi-Domain Application Network Topologies” Ilia Baldine, Charles.
Ewa Deelman, Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example Nandita Mangal,
Ewa Deelman, Clouds: An Opportunity for Scientific Applications? Ewa Deelman USC Information Sciences.
Pegasus: Mapping complex applications onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
A Grid-Enabled Engine for Delivering Custom Science- Grade Images on Demand
An Astronomical Image Mosaic Service for the National Virtual Observatory / ESTO.
Ewa Deelman Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.
Intermediate HTCondor: Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.
CREATING A MULTI-WAVELENGTH GALACTIC PLANE ATLAS WITH AMAZON WEB SERVICES G. Bruce Berriman, John Good IPAC, California Institute of Technolog y Ewa Deelman,
Ewa Deelman, Clouds: An Opportunity for Scientific Applications? Ewa Deelman USC Information Sciences.
Managing Workflows with the Pegasus Workflow Management System
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
Ewa Deelman, Pegasus and DAGMan: From Concept to Execution Mapping Scientific Workflows onto the National.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
Authors: Weiwei Chen, Ewa Deelman 9th International Conference on Parallel Processing and Applied Mathmatics 1.
Pegasus A Framework for Workflow Planning on the Grid Ewa Deelman USC Information Sciences Institute Pegasus Acknowledgments: Carl Kesselman, Gaurang Mehta,
The Grid is a complex, distributed and heterogeneous execution environment. Running applications requires the knowledge of many grid services: users need.
LIGO-G9900XX-00-M LIGO and GriPhyN Carl Kesselman Ewa Deelman Sachin Chouksey Roy Williams Peter Shawhan Albert Lazzarini Kent Blackburn CaltechUSC/ISI.
LIGO- G E LIGO Grid Applications Development Within GriPhyN Albert Lazzarini LIGO Laboratory, Caltech GriPhyN All Hands.
G Z LIGO Scientific Collaboration Grid Patrick Brady University of Wisconsin-Milwaukee LIGO Scientific Collaboration.
Large-Scale Science Through Workflow Management Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
Managing large-scale workflows with Pegasus Karan Vahi ( Collaborative Computing Group USC Information Sciences Institute Funded.
Combining the strengths of UMIST and The Victoria University of Manchester Utility Driven Adaptive Workflow Execution Kevin Lee School of Computer Science,
 The workflow description modified to output a VDS DAX.  The workflow description toolkit developed allows any concrete workflow description to be migrated.
Experiences Using Cloud Computing for A Scientific Workflow Application Jens Vöckler, Gideon Juve, Ewa Deelman, Mats Rynge, G. Bruce Berriman Funded by.
Pegasus-a framework for planning for execution in grids Ewa Deelman USC Information Sciences Institute.
Pegasus: Planning for Execution in Grids Ewa Deelman Information Sciences Institute University of Southern California.
A Survey of Distributed Task Schedulers Kei Takahashi (M1)
Dr. Ahmed Abdeen Hamed, Ph.D. University of Vermont, EPSCoR Research on Adaptation to Climate Change (RACC) Burlington Vermont USA MODELING THE IMPACTS.
Pegasus: Mapping Scientific Workflows onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - Based on “Optimizing.
Combining the strengths of UMIST and The Victoria University of Manchester Adaptive Workflow Processing and Execution in Pegasus Kevin Lee School of Computer.
Pegasus: Running Large-Scale Scientific Workflows on the TeraGrid Ewa Deelman USC Information Sciences Institute
Intermediate Condor: Workflows Rob Quick Open Science Grid Indiana University.
Pegasus: Mapping complex applications onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
CEDPS Data Services Ann Chervenak USC Information Sciences Institute.
Pegasus WMS: Leveraging Condor for Workflow Management Ewa Deelman, Gaurang Mehta, Karan Vahi, Gideon Juve, Mats Rynge, Prasanth.
Experiment Management from a Pegasus Perspective Jens-S. Vöckler Ewa Deelman
Intermediate Condor: Workflows Monday, 1:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
Pegasus-a framework for planning for execution in grids Karan Vahi USC Information Sciences Institute May 5 th, 2004.
Planning Ewa Deelman USC Information Sciences Institute GriPhyN NSF Project Review January 2003 Chicago.
Pegasus: Planning for Execution in Grids Ewa Deelman, Carl Kesselman, Gaurang Mehta, Gurmeet Singh, Karan Vahi Information Sciences Institute University.
Contact: Junwei Cao SC2005, Seattle, WA, November 12-18, 2005 The authors gratefully acknowledge the support of the United States National.
LIGO Plans for OSG J. Kent Blackburn LIGO Laboratory California Institute of Technology Open Science Grid Technical Meeting UCSD December 15-17, 2004.
A Fully Automated Fault- tolerant System for Distributed Video Processing and Off­site Replication George Kola, Tevfik Kosar and Miron Livny University.
Funded by the NSF OCI program grants OCI and OCI Mats Rynge, Gideon Juve, Karan Vahi, Gaurang Mehta, Ewa Deelman Information Sciences Institute,
LIGO-G W Use of Condor by the LIGO Scientific Collaboration Gregory Mendell, LIGO Hanford Observatory On behalf of the LIGO Scientific Collaboration.
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
LIGO-G Z1 Using Condor for Large Scale Data Analysis within the LIGO Scientific Collaboration Duncan Brown California Institute of Technology.
1 Pegasus and wings WINGS/Pegasus Provenance Challenge Ewa Deelman Yolanda Gil Jihie Kim Gaurang Mehta Varun Ratnakar USC Information Sciences Institute.
LIGO: The Laser Interferometer Gravitational-wave Observatory Sarah Caudill Louisiana State University Physics Graduate Student LIGO Hanford LIGO Livingston.
1 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Part IV Workflow Mapping and Execution in Pegasus (Thanks.
Summary of OSG Activities by LIGO and LSC LIGO NSF Review November 9-11, 2005 Kent Blackburn LIGO Laboratory California Institute of Technology LIGO DCC:
Managing LIGO Workflows on OSG with Pegasus Karan Vahi USC Information Sciences Institute
Ewa Deelman, Managing Scientific Workflows on OSG with Pegasus Ewa Deelman USC Information Sciences.
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
VO Experiences with Open Science Grid Storage OSG Storage Forum | Wednesday September 22, 2010 (10:30am)
Pegasus WMS Extends DAGMan to the grid world
Cloudy Skies: Astronomy and Utility Computing
Workflows and the Pegasus Workflow Management System
Pegasus and Condor Gaurang Mehta, Ewa Deelman, Carl Kesselman, Karan Vahi Center For Grid Technologies USC/ISI.
Clouds: An Opportunity for Scientific Applications?
Ewa Deelman University of Southern California
Pegasus Workflows on XSEDE
Mats Rynge USC Information Sciences Institute
High Throughput Computing for Astronomers
A General Approach to Real-time Workflow Monitoring
Frieda meets Pegasus-WMS
Presentation transcript:

Ewa Deelman, Optimizing for Time and Space in Distributed Scientific Workflows Ewa Deelman University of Southern California Information Sciences Institute

Ewa Deelman, *The full moon is 0.5 deg. sq. when viewed form Earth, Full Sky is ~ 400,000 deg. sq. Generating mosaics of the sky (Bruce Berriman, Caltech) Size of the mosaic is degrees square* Number of jobs Number of input data files Number of Intermediate files Total data footprint Approx. execution time (20 procs) GB40 mins 21, ,9065.5GB49 mins 44, ,06120GB1hr 46 mins 68,5861,44422,85038GB2 hrs. 14 mins 1020,6523,72254,43497GB6 hours

Ewa Deelman, Issue How to manage such applications on a variety of resources and distributed resources? Approach Structure the application as a workflow Allow the scientist to describe the application at a high- level Tie the execution resources together Using Condor and/or Globus Provide an automated mapping and execution tool to map the high-level description onto the available resources

Ewa Deelman, Specification: Place Y = F(x) at L Find where x is--- {S1,S2, …} Find where F can be computed--- {C1,C2, …} Choose c and s subject to constraints (performance, space availability,….) Move x from s to c Move F to c Compute F(x) at c Move Y from c to L Register Y in data registry Record provenance of Y, performance of F(x) at c Error! c crashed! Error! x was not at s! Error! F(x) failed! Error! there is not enough space at L!

Ewa Deelman, Pegasus-Workflow Management System Leverages abstraction for workflow description to obtain ease of use, scalability, and portability Provides a compiler to map from high-level descriptions to executable workflows Correct mapping Performance enhanced mapping Provides a runtime engine to carry out the instructions Scalable manner Reliable manner In collaboration with Miron Livny, UW Madison, funded under NSF-OCI SDCI

Ewa Deelman, Pegasus Workflow Management System Condor Schedd Reliable, scalable execution of independent tasks (locally, across the network), priorities, scheduling DAGMan Reliable and scalable execution of dependent tasks Pegasus mapper A decision system that develops strategies for reliable and efficient execution in a variety of environments Cyberinfrastructure: Local machine, cluster, Condor pool, OSG, TeraGrid Abstract Workflow Results

Ewa Deelman, Basic Workflow Mapping Select where to run the computations Apply a scheduling algorithm HEFT, min-min, round-robin, random The quality of the scheduling depends on the quality of information Transform task nodes into nodes with executable descriptions Execution location Environment variables initializes Appropriate command-line parameters set Select which data to access Add stage-in nodes to move data to computations Add stage-out nodes to transfer data out of remote sites to storage Add data transfer nodes between computation nodes that execute on different resources

Ewa Deelman, Basic Workflow Mapping Add nodes that register the newly-created data products Add data cleanup nodes to remove data from remote sites when no longer needed reduces workflow data footprint Provide provenance capture steps Information about source of data, executables invoked, environment variables, parameters, machines used, performance

Ewa Deelman, Pegasus Workflow Mapping Original workflow: 15 compute nodes devoid of resource assignment Resulting workflow mapped onto 3 Grid sites: 11 compute nodes (4 reduced based on available intermediate data) 12 data stage-in nodes 8 inter-site data transfers 14 data stage-out nodes to long- term storage 14 data registration nodes (data cataloging) jobs to execute

Ewa Deelman, Time Optimizations during Mapping Node clustering for fine-grained computations Can obtain significant performance benefits for some applications (in Montage ~80%, SCEC ~50% ) Data reuse in case intermediate data products are available Performance and reliability advantages—workflow-level checkpointing Workflow partitioning to adapt to changes in the environment Map and execute small portions of the workflow at a time

Ewa Deelman, LIGO: (Laser Interferometer Gravitational-Wave Observatory) Aims to detect gravitational waves predicted by Einstein’s theory of relativity. Can be used to detect binary pulsars mergers of black holes “starquakes” in neutron stars Two installations: in Louisiana (Livingston) and Washington State Other projects: Virgo (Italy), GEO (Germany), Tama (Japan) Instruments are designed to measure the effect of gravitational waves on test masses suspended in vacuum. Data collected during experiments is a collection of time series (multi-channel)

Ewa Deelman, LIGO: (Laser Interferometer Gravitational-Wave Observatory) Aims to detect gravitational waves predicted by Einstein’s theory of relativity. Can be used to detect binary pulsars mergers of black holes “starquakes” in neutron stars Two installations: in Louisiana (Livingston) and Washington State Other projects: Virgo (Italy), GEO (Germany), Tama (Japan) Instruments are designed to measure the effect of gravitational waves on test masses suspended in vacuum. Data collected during experiments is a collection of time series (multi-channel) LIGO Livingston

Ewa Deelman, Example of LIGO’s computations Binary inspiral analysis Size of analysis for meaningful results at least 221 GBytes of gravitational-wave data approximately 70,000 computational tasks Desired analysis: Data from November November TB of input data Approximately 185,000 computations edges 1 Tb of output data

Ewa Deelman, LIGO’s computational resources LIGO Data Grid Condor clusters managed by the collaboration ~ 6,000 CPUs Open Science Grid A US cyberinfrastructure shared by many applications ~ 20 Virtual Organizations ~ 258 GB of shared scratch disk space on OSG sites

Ewa Deelman, Problem How to “fit” the computations onto the OSG Take into account intermediate data products Minimize the data footprint of the workflow Schedule the workflow tasks in a disk-space aware fashion

Ewa Deelman, Workflow Footprint In order to improve the workflow footprint, we need to determine when data are no longer needed: Because data was consumed by the next component and no other component needs it Because data was staged-out to permanent storage Because data are no longer needed on a resource and have been stage-out to the resource that needs it

Ewa Deelman, For each node add dependencies to cleanup all the files used and produced by the node If a file is being staged-in from r1 to r2, add a dependency between the stage-in and the cleanup node If a file is being staged-out, add a dependency between the stage- out and the cleanup node Cleanup Disk Space as Workflow Progresses

Ewa Deelman, Experiments on the Grid with LIGO and Montage

Ewa Deelman, 1.25GB versus 4.5 GB Open Science Grid Cleanup on the Grid, Montage application ~ 1,200 nodes

Ewa Deelman, LIGO Inspiral Analysis Workflow Small test workflow, 166 tasks, 600 GB max total storage (includes intermediate data products) LIGO workflow running on OSG

Ewa Deelman, Assumes level-based scheduling, all nodes at a level need to complete before the next level starts Opportunities for data cleanup in LIGO workflow

Ewa Deelman, Montage Workflow

Ewa Deelman, LIGO Workflows 26% Improvement In disk space Usage 50% slower runtime

Ewa Deelman, LIGO Workflows 26% Improvement In disk space Usage 50% slower runtime

Ewa Deelman, LIGO Workflows 56% improvement in space usage 3 times slower in runtime

Ewa Deelman, Challenges in implementing data space-aware scheduling Difficult to get accurate performance estimates for tasks Difficult to get good estimates of the sizes of the output data Errors compound in the workflow Difficult to get accurate estimates of data storage space Space is shared among many users Hard to get allocation estimates available to users Even if you have space when you schedule, may not be there to receive all the data Need space allocation support a b c f g h b c d e

Ewa Deelman, Conclusions Data are an important part of today’s applications and need to be managed Optimizing workflow disk space usage Data workflow footprint concept applicable within one resource Data-aware scheduling across resources Designed an algorithm which can cleanup the data as a workflow progresses The effectiveness of the algorithm depends on the structure of the workflow and its data characteristics Workflow restructuring may be needed to decrease footprint

Ewa Deelman, Acknowledgments Henan Zhao, Rizos Sakellariou University of Manchester, UK Kent Blackburn, Duncan Brown, Stephen Fairhurst, David Meyers LIGO, Caltech, USA G. Bruce Berriman, John Good, Daniel S. Katz Montage, Caltech and LSU, USA Miron Livny and Kent Wenger DAGMan, UW Madison Gurmeet Singh, Karan Vahi, Arun Ramakrishnan, Gaurang Mehta USC Information Science Institute, Pegasus

Ewa Deelman, Relevant Links Condor: Pegasus: pegasus.isi.edupegasus.isi.edu LIGO: Montage: montage.ipac.caltech.edu/montage.ipac.caltech.edu/ Open Science Grid: Workflows for e-Science I.J. Taylor, E. Deelman, D. B. Gannon M. Shields (Eds.), Springer, Dec NSF Workshop on Challenges of Scientific Workflows : E. Deelman and Y. Gil (chairs) OGF Workflow research group