Presentation is loading. Please wait.

Presentation is loading. Please wait.

EpiC: an Extensible and Scalable System for Processing Big Data.

Similar presentations


Presentation on theme: "EpiC: an Extensible and Scalable System for Processing Big Data."— Presentation transcript:

1 epiC: an Extensible and Scalable System for Processing Big Data

2 Why we need another new MapReduce-like system? MapReduce  M/R framework cannot handle iterative processing efficiently  Everything needs to be transformed into map and reduce functions Pregel/GraphLab/Dryad  DAG based data flow  User should design how the graph is constructed and how different operators are linked Can we combine the advantages of both types of systems?

3 Overview of epiC Unit works independently Units communicate via “email” Master works as mail server to forward the messages epiC is based on the actor- model

4 Compare epiC to MapReduce and Pregel Using PageRank as an example MapReduce:  Multi-iterations  The second job loads the output of the first job to continue the processing

5 Compare epiC to MapReduce and Pregel Pregel:  In each super-step, the vertex computes its new PageRank values and broadcasts the value to its neighbors

6 Compare epiC to MapReduce and Pregel epiC  0. send messages to unit to activate it  1. Unit loads a partition of graph data and score vector based on the received message  2. compute new score vector of vertices  3. generate new score vector files  4. send messages to master network

7 Compare epiC to MapReduce and Pregel Flexibility:  MR is not designed for such job. Pregel and epiC can express the algorithm more effectively. Unit in epiC is equivalent to the worker of Pregel Optimization:  Both MR and epiC supports customized optimization, e.g., buffering the intermediate results in local disk Extensibility:  MR and Pregel have their pre-defined programming model, while in epiC, users can create their own.

8 Using epiC to simulate MR Create two basic units: MapUnit and ReduceUnit MapUnit loads a partition of data and sends messages to all ReduceUnits ReduceUnit gets its input from the DFS. The locations of the input are obtained from the messages of MapUnits.

9 Using epiC to simulate Relational DB Three Units are created:  SingleTableUnit: Handles all processings on a single Table  JoinUnit: Joins two or more tables  AggregatUnit: applies the group by operator and computes the aggregation results

10 Using epiC to simulate Relational DB Example: TPC-H Q3 5 steps are required

11 TPC-H Q3 (Step 1)

12 TPC-H Q3 (Step 2 and 3)

13 TPC-H Q3 (Step 4 and 5)


Download ppt "EpiC: an Extensible and Scalable System for Processing Big Data."

Similar presentations


Ads by Google