Pegasus on the Virtual Grid: A Case Study of Workflow Planning over Captive Resources Yang-Suk Kee, Eun-Kyu Byun, Ewa Deelman, Kran Vahi, Jin-Soo Kim Oracle US Inc Korea Advanced Institute of Science and Technology Information Sciences Institute/University of Southern California Sungkyunkwan University
Overview Motivation Background – Pegasus – Virtual Grid Pegasus-VG Proxy Conclusion Discussion
Motivation Challenges in scientific application development – Data/control flow, task scheduling, data replication, fault-tolerance, etc Challenges in resource management – Availability, performance, cost, reliability, fault- tolerance, etc How to leverage existing cyber infrastructures for easy and efficient scientific computing?
Separations of Concerns Application domain – Workflow management: application management can be conducted independently of target execution environments. – E.g.) Pegasus, Askalon, Triana Resource domain – Resource provisioning: resource management can be encapsulated underneath abstractions or virtualizations – E.g.) Virtual Grid, virtual cluster, cloud
Workflow planning and execution over provisioned resources
Pegasus A framework for workflow planning and execution Workflow lifecycle – Design: describe the data/control flows of application via an abstract workflow – Planning: map the workflow tasks onto physical resources – Execution: schedule and run the workflow tasks on the mapped resources
Pegasus Workflow Management Pegasus mapper Condor DAGman Condor Computing environment Monitoring Information provenance Pegasus Executable workflow tasks Monitoring Information provenance Abstract workflow Condor pool
Virtual Grid A programmable virtualized resource provisioning framework Components – vgDL (Virtual Grid Description Language) Specifies resource requirements – vgES (Virtual Grid Execution System) Compiles and coordinates resources – PC (Personal Cluster) Provides uniform job management
Timeshare A BC D Application Virtual Grid Resource Abstraction Virtual Grid Resource Abstraction VG Timeshare Lease Batch VG PBS P4 VGDL vgdl=clusterof (node) [2] { node = [Processor==“P4”] } program run AB C D ClassificationSelectionBindingEnvironment ok
Pegasus on Virtual Grid Scope – A basic integration for workflow planning and execution over provisioned resources Issues – Resource capacity estimation Resource specification (vgDL) synthesis for Virtual Grid – Resource information publication Site catalog generation for Pegasus
Resource Capacity Estimation What Virtual Grid expects from Pegasus – vgDL description Available information – Task execution time, data transfer time, performance metrics, minimum memory capacity, cost, deadline, etc Unknown information – # of virtual processors Resource capacity estimate – Minimize the # of processors that can execute a workflow within a deadline
BTS (Balanced Time Scheduling) Ref: E-science’08 E.-K. Byun, Y.-S. Kee et. al ID ET Time p1 p2 How many processors do we need to run this workflow within 7 units?
Example Execution time of each task - Xeon processor Data transfer time - network with 1Gbs bandwidth. Deadline is 1 hour. Diamond = ClusterOf [2] (nd) [, 0:30:00] { nd = [Processor == “Xeon”] } preprocess findrange analyze f.input f.output
Resource Information Publication What Pegasus expects from Virtual Grid – Site catalog Virtual Grid – VG instance Resource information publication – Devirtualize a VG instance and generate a site catalog for Pegasus
Timeshare A BC D Application Virtual Grid Resource Abstraction Virtual Grid Resource Abstraction VG Timeshare Lease Batch VG PBS P4 VGDL vgdl=clusterof (node) [2] { node = [Processor==“P4”] } program run AB C D ClassificationSelectionBindingEnvironment ok
Personal Cluster A partition of resources dedicated to a user under the control of a user-level resource manager during a limited time period GT4/PBS Ref: HCW’08 Y.-S. Kee and C. Kesselman
Site Catalog Publication … /home/globus/pegasus gt4 PBS $HOME/workdir …
Workflow Planning over Provisioned Resources Creation Planning Scheduling/ Execution A BC D CC A BC D CC Executable workflow Abstract workflow BTS VG Virtual Grid VGDL Devirtualization Site catalog vgdl = ClusterOf (nd) [2] { nd = [Proc==“Xeon”] } GT4+PBS PegasusVG-Pegasus Proxy
Conclusion Pegasus on Virtual Grid – Implements workflow planning and execution over on-demand captive resources – Enables easy and efficient application development and execution Issues – Resource capacity estimation – Site catalog publication
Discussion Effective performance – What is the cost that a user has to pay to have a successful execution? Ongoing studies – Find-grain planning for resource provisioning Performance, cost, reliability – Workflow execution for virtualization Recovery of failed tasks
Need More Information? Pegaus – VGrADS – Tuesday, 11:30am, RENCI booth (2633) – Wednesday, noon, GCAS booth (285) – Wednesday, 2:00Pm, SDSC booth (568) – Wednesday, 4:00pm, RENCI booth (2633)
A Q & Q U E S T I O N S A N S W E R S