Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jianwu Wang, Daniel Crawl, Ilkay Altintas San Diego Supercomputer Center, University of California, San Diego 9500 Gilman Drive, MC 0505 La Jolla, CA 92093-0505,

Similar presentations


Presentation on theme: "Jianwu Wang, Daniel Crawl, Ilkay Altintas San Diego Supercomputer Center, University of California, San Diego 9500 Gilman Drive, MC 0505 La Jolla, CA 92093-0505,"— Presentation transcript:

1 Jianwu Wang, Daniel Crawl, Ilkay Altintas San Diego Supercomputer Center, University of California, San Diego 9500 Gilman Drive, MC 0505 La Jolla, CA 92093-0505, U.S.A. {jianwu, crawl, altintas}@sdsc.edu Presentation by Woodrow H. Edwards

2 Kepler  Open source scientific workflow system  Executable model of the many stages transforming data into the desired result in a scientific domain  Scientific domains using Kepler Bioinformatics, Computational Chemistry, Ecoinformatics, and Geoinformatics  All have large data sets and require a lot of computation

3 Kepler  User friendly GUI to connect data sources to built-in procedures or independent applications with the ease of drag and drop  Promotes component reuse and sharing  Written in Java  Designed to run on clusters, grids, or the Web  A nice match to integrate with MapReduce

4 Kepler  Components of a Kepler workflow Actors ○ Independently process data ○ Atomic or composite ○ Ports input and ouput data (tokens) or signals ○ Could be R or MATLAB scripts or an outside application Channels ○ Link actors ○ Carry data or other signals Directors ○ Specify when actors run ○ Sequential (SPD) or parallel (PN)

5 Figure 1: Example Kepler workflow [2]

6 Hadoop  Open source implementation of MapReduce map(in_key, in_value)  (out_key, intermediate_value) list reduce(out_key, intermediate_value list)  out_value list  HDFS  Data partitioning, scheduling, load balancing, and fault tolerance  Also written in Java

7 Kepler + Hadoop  Implement a MapReduce composite actor Map actor ○ MapInputKey: in_key ○ MapInputValue: in_value ○ MapOutputList: (out_key, intermediate_value) list Reduce actor ○ ReduceInputKey: out_key ○ ReduceInputList: intermediate_value list ○ ReduceOutputValue: out_value list Figure 2: (a) MapReduce composite actor. (b) Map actor. (c) Reduce actor. [1]

8 Kepler + Hadoop Figure 3: Hierarchical execution of MapReduce composite actor with Hadoop [1]

9 Kepler + Hadoop Figure 4: (a) Word Count workflow. (b) Map actor. (c) Reduce actor. (d) IterateOverArray actor. [1]

10 Kepler + Hadoop  Takes 10 to 15% longer over native Hadoop MapReduce  Makes up for it in ease of implementation  Scientist can use MapReduce without needing to know the framework  They only need to know where they can benefit from parallelism in their workflow

11 References 1. J. Wang, D. Crawl, and I. Altintas. Kepler + Hadoop: A General Architecture Facilitating Data- Intensive Applications in Scientific Workflow Systems. In WORKS 09, ACM, Nov. 2009. 2. The Kepler Project. https://kepler-project.org.https://kepler-project.org 3. The Apache Hadoop Project. http://hadoop.apache.org. http://hadoop.apache.org


Download ppt "Jianwu Wang, Daniel Crawl, Ilkay Altintas San Diego Supercomputer Center, University of California, San Diego 9500 Gilman Drive, MC 0505 La Jolla, CA 92093-0505,"

Similar presentations


Ads by Google