Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pegasus-a framework for planning for execution in grids Ewa Deelman USC Information Sciences Institute.

Similar presentations


Presentation on theme: "Pegasus-a framework for planning for execution in grids Ewa Deelman USC Information Sciences Institute."— Presentation transcript:

1 Pegasus-a framework for planning for execution in grids Ewa Deelman deelman@isi.edu USC Information Sciences Institute

2 2 Ewa Deelmanpegasus.isi.edu Outline Pegasus overview Components used by Pegasus Deferred planning Portal Pegasus Acknowledgments: Carl Kesselman, Gaurang Mehta, Mei-Hui Su, Gurmeet Singh, Karan Vahi, Ewa Deelman

3 3 Ewa Deelmanpegasus.isi.edu Pegasus l Flexible framework, maps abstract workflows onto the Grid l Possess well-defined APIs and clients for: –Information gathering >Resource information >Replica query mechanism >Transformation catalog query mechanism –Resource selection >Compute site selection >Replica selection –Data transfer mechanism l Can support a variety of workflow executors

4 4 Ewa Deelmanpegasus.isi.edu Pegasus Components

5 5 Ewa Deelmanpegasus.isi.edu Pegasus: A particular configuration l Automatically locates physical locations for both components (transformations) and data –Use Globus RLS and the Transformation Catalog l Finds appropriate resources to execute the jobs –Via Globus MDS l Reuses existing data products where applicable –Possibly reduces the workflow l Publishes newly derived data products –RLS, Chimera virtual data catalog

6 6 Ewa Deelmanpegasus.isi.edu Replica Location Service l Pegasus uses the RLS to find input data LRC RLI Computation l Pegasus uses the RLS to register new data products

7 7 Ewa Deelmanpegasus.isi.edu Use of MDS in Pegasus l MDS provides up-to-date Grid state information –Total and idle job queues length on a pool of resources (condor) –Total and available memory on the pool –Disk space on the pools –Number of jobs running on a job manager l Can be used for resource discovery and selection –Developing various task to resource mapping heuristics (pluggable) l Can be used to publish information necessary for replica selection –Developing replica selection components

8 8 Ewa Deelmanpegasus.isi.edu KEY The original node Pull transfer node Registration node Push transfer node Job e Job gJob h Job d Job a Job c Job f Job i Job b Abstract Dag Reduction Pegasus Queries the RLS and finds the data products of jobs d,e,f already materialized. Hence deletes those jobs On applying the reduction algorithm additional jobs a,b,c are deleted Implemented by Karan Vahi

9 9 Ewa Deelmanpegasus.isi.edu Pegasus adds replica nodes for each job that materializes data (g, h, i ). These three nodes are for transferring the output files of the leaf job (f) to the output pool, since job f has been deleted by the Reduction Algorithm. Concrete Planner (1) Pegasus schedules job g,h on pool X and job i on pool Y. Hence adding an interpool transfer node KEY The original node Pull transfer node Registration node Push transfer node Node deleted by Reduction algo Inter-pool transfer node Job e Job gJob h Job d Job a Job c Job f Job i Job b Pegasus adds transfer nodes for transferring the input files for the root nodes of the decomposed dag (job g) Implemented by Karan Vahi

10 10 Ewa Deelmanpegasus.isi.edu Pegasus Components l Concrete Planner and Submit file generator (gencdag) –The Concrete Planner of the VDS makes the logical to physical mapping of the DAX taking into account the pool where the jobs are to be executed (execution pool) and the final output location (output pool).

11 11 Ewa Deelmanpegasus.isi.edu Pegasus Components (cont’d) l The following catalogs are looked up to make the translation –Transformation Catalog (tc.data) (also DB based) –Pool Config File –Replica Location Services –Monitoring and Discovery Services l XML Pool Config generator (genpoolconfig) –The Pool Config generator queries the MDS as well as local pool config files to generate a XML pool config which is used by Pegasus. –MDS is preferred for generation pool configuration as it provides a much richer information about the pool including the queue statistics, available memory etc.

12 12 Ewa Deelmanpegasus.isi.edu Transformation Catalog l Pegasus needs to access a catalog to determine the pools where it can run a particular piece of code. l If a site does not have the executable, one should be able to ship the executable to the remote site. –Newer version of Pegasus will prestage a statically linked executable l Generic TC API for users to implement their own transformation catalog. l Current Implementations –File Based #poolname logical tr physical tr env isi preprocess /usr/vds/bin/preprocess VDS_HOME=/usr/vds/; –Database Based

13 13 Ewa Deelmanpegasus.isi.edu Pool Config l Pool Config is an XML file which contains information about various pools on which DAGs may execute. l Some of the information contained in the Pool Config file is –Specifies the various job-managers that are available on the pool for the different types of condor universes. –Specifies the GridFtp storage servers associated with each pool. –Specifies the Local Replica Catalogs where data residing in the pool has to be cataloged. –Contains profiles like environment hints which are common site-wide. –Contains the working and storage directories to be used on the pool.

14 14 Ewa Deelmanpegasus.isi.edu Gvds.Pool.Config l This file is read by the information provider and published into MDS. l Format gvds.pool.id : gvds.pool.lrc : gvds.pool.gridftp : @ gvds.pool.gridftp : gsiftp://sukhna.isi.edu/nfs/asd2/gmehta@2.4.0 gvds.pool.universe : @ @ gvds.pool.universe : transfer@columbus.isi.edu/jobmanager- fork@2.2.4 gvds.pool.gridlaunch : gvds.pool.workdir : gvds.pool.profile : @ @ gvds.pool.profile : env@GLOBUS_LOCATION@/smarty/gt2.2.4 gvds.pool.profile : vds@VDS_HOME@/nfs/asd2/gmehta/vds

15 15 Ewa Deelmanpegasus.isi.edu Pool config l Two Ways to construct the Pool Config File. –Monitoring and Discovery Service –Local Pool Config File (Text Based) l Client tool to generate Pool Config File –The tool genpoolconfig is used to query the MDS and/or the local pool config file/s to generate the XML Pool Config file.

16 16 Ewa Deelmanpegasus.isi.edu Properties l Properties file define and modify the behavior of Pegasus. l Properties set in the $VDS_HOME/properties can be overridden by defining them either in $HOME/.chimerarc or by giving them on the command line of any executable. –eg. Gendax –Dvds.home=path to vds home…… l Some examples follow but for more details please read the sample.properties file in $VDS_HOME/etc directory. l Basic Required Properties –vds.home : This is auto set by the clients from the environment variable $VDS_HOME –vds.properties : Path to the default properties file >Default : ${vds.home}/etc/properties

17 17 Ewa Deelmanpegasus.isi.edu Concrete Planner Gencdag l The Concrete planner takes the DAX produced by Chimera and converts into a set of condor dag and submit files. l Usage : gencdag --dax --p [--dir ] [--o ] [--force] l You can specify more then one execution pools. Execution will take place on the pools on which the executable exists. If the executable exists on more then one pool then the pool on which the executable will run is selected randomly. l The Output pool is the pool where you want all the output products to be transferred to. If not specified the materialized data stays on the execution pool

18 18 Ewa Deelmanpegasus.isi.edu Full Ahead Planning l At the time of submission of the workflow, decisions are made as to where to schedule the jobs in the workflow. l Allows to perform certain optimizations by looking ahead for bottleneck jobs and then scheduling around them. l However, for large workflows the decision made at submission time may no longer be valid or optimum at the point the job is actually run.

19 19 Ewa Deelmanpegasus.isi.edu Deferred Planning l Delay the decision of mapping the job to the site as late as possible. l Involves partitioning of the original dax into smaller daxes each of which refers to a partition on which Pegasus is run. l A Mega DAG is constructed. It ends up running Pegasus automatically on the partition daxes, as each partition is ready to run.

20 20 Ewa Deelmanpegasus.isi.edu Deferred Planning through Partitioning A variety of planning algorithms can be implemented

21 21 Ewa Deelmanpegasus.isi.edu Mega DAG is created by Pegasus and then submitted to DAGMan

22 22 Ewa Deelmanpegasus.isi.edu l Create workflow partitions –partitiondax --dax./blackdiamond.dax --dir dax l Create the MegaDAG (creates the dagman submit files) – gencdag - Dvds.properties=~/conf/properties -- pdax./dax/blackdiamond.pdax --pools isi_condor --o isi_condor --dir./dags/ Note the --pdax option instead of the normal --dax option. l submit the.dag file for the mega dag –condor_submit_dag black-diamond_0.dag

23 23 Ewa Deelmanpegasus.isi.edu More info l www.griphyn.org/chimera l pegasus.isi.edu


Download ppt "Pegasus-a framework for planning for execution in grids Ewa Deelman USC Information Sciences Institute."

Similar presentations


Ads by Google