Jianwu Wang, Daniel Crawl, Ilkay Altintas San Diego Supercomputer Center, University of California, San Diego 9500 Gilman Drive, MC 0505 La Jolla, CA 92093-0505,

Jianwu Wang, Daniel Crawl, Ilkay Altintas San Diego Supercomputer Center, University of California, San Diego 9500 Gilman Drive, MC 0505 La Jolla, CA 92093-0505, U.S.A. {jianwu, crawl, altintas}@sdsc.edu Presentation by Woodrow H. Edwards

Kepler  Open source scientific workflow system  Executable model of the many stages transforming data into the desired result in a scientific domain  Scientific domains using Kepler Bioinformatics, Computational Chemistry, Ecoinformatics, and Geoinformatics  All have large data sets and require a lot of computation

Kepler  User friendly GUI to connect data sources to built-in procedures or independent applications with the ease of drag and drop  Promotes component reuse and sharing  Written in Java  Designed to run on clusters, grids, or the Web  A nice match to integrate with MapReduce

Kepler  Components of a Kepler workflow Actors ○ Independently process data ○ Atomic or composite ○ Ports input and ouput data (tokens) or signals ○ Could be R or MATLAB scripts or an outside application Channels ○ Link actors ○ Carry data or other signals Directors ○ Specify when actors run ○ Sequential (SPD) or parallel (PN)

Figure 1: Example Kepler workflow [2]

Hadoop  Open source implementation of MapReduce map(in_key, in_value)  (out_key, intermediate_value) list reduce(out_key, intermediate_value list)  out_value list  HDFS  Data partitioning, scheduling, load balancing, and fault tolerance  Also written in Java

Kepler + Hadoop  Implement a MapReduce composite actor Map actor ○ MapInputKey: in_key ○ MapInputValue: in_value ○ MapOutputList: (out_key, intermediate_value) list Reduce actor ○ ReduceInputKey: out_key ○ ReduceInputList: intermediate_value list ○ ReduceOutputValue: out_value list Figure 2: (a) MapReduce composite actor. (b) Map actor. (c) Reduce actor. [1]

Kepler + Hadoop Figure 3: Hierarchical execution of MapReduce composite actor with Hadoop [1]

Kepler + Hadoop Figure 4: (a) Word Count workflow. (b) Map actor. (c) Reduce actor. (d) IterateOverArray actor. [1]

Kepler + Hadoop  Takes 10 to 15% longer over native Hadoop MapReduce  Makes up for it in ease of implementation  Scientist can use MapReduce without needing to know the framework  They only need to know where they can benefit from parallelism in their workflow

References 1. J. Wang, D. Crawl, and I. Altintas. Kepler + Hadoop: A General Architecture Facilitating Data- Intensive Applications in Scientific Workflow Systems. In WORKS 09, ACM, Nov. 2009. 2. The Kepler Project. https://kepler-project.org.https://kepler-project.org 3. The Apache Hadoop Project. http://hadoop.apache.org. http://hadoop.apache.org

Jianwu Wang, Daniel Crawl, Ilkay Altintas San Diego Supercomputer Center, University of California, San Diego 9500 Gilman Drive, MC 0505 La Jolla, CA 92093-0505,

Similar presentations

Presentation on theme: "Jianwu Wang, Daniel Crawl, Ilkay Altintas San Diego Supercomputer Center, University of California, San Diego 9500 Gilman Drive, MC 0505 La Jolla, CA 92093-0505,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Jianwu Wang, Daniel Crawl, Ilkay Altintas San Diego Supercomputer Center, University of California, San Diego 9500 Gilman Drive, MC 0505 La Jolla, CA 92093-0505,

Similar presentations

Presentation on theme: "Jianwu Wang, Daniel Crawl, Ilkay Altintas San Diego Supercomputer Center, University of California, San Diego 9500 Gilman Drive, MC 0505 La Jolla, CA 92093-0505,"— Presentation transcript:

Similar presentations

About project

Feedback