Presentation is loading. Please wait.

Presentation is loading. Please wait.

epiC: an Extensible and Scalable System for Processing Big Data

Similar presentations


Presentation on theme: "epiC: an Extensible and Scalable System for Processing Big Data"— Presentation transcript:

1 epiC: an Extensible and Scalable System for Processing Big Data

2 Why we need another new MapReduce-like system?
M/R framework cannot handle iterative processing efficiently Everything needs to be transformed into map and reduce functions Pregel/GraphLab/Dryad DAG based data flow User should design how the graph is constructed and how different operators are linked Can we combine the advantages of both types of systems?

3 Overview of epiC Unit works independently
Units communicate via “ ” Master works as mail server to forward the messages epiC is based on the actor-model

4 Compare epiC to MapReduce and Pregel
Using PageRank as an example MapReduce: Multi-iterations The second job loads the output of the first job to continue the processing

5 Compare epiC to MapReduce and Pregel
In each super-step, the vertex computes its new PageRank values and broadcasts the value to its neighbors

6 Compare epiC to MapReduce and Pregel
0. send messages to unit to activate it 1. Unit loads a partition of graph data and score vector based on the received message 2. compute new score vector of vertices 3. generate new score vector files 4. send messages to master network

7 Compare epiC to MapReduce and Pregel
Flexibility: MR is not designed for such job. Pregel and epiC can express the algorithm more effectively. Unit in epiC is equivalent to the worker of Pregel Optimization: Both MR and epiC supports customized optimization, e.g., buffering the intermediate results in local disk Extensibility: MR and Pregel have their pre-defined programming model, while in epiC, users can create their own.

8 Using epiC to simulate MR
Create two basic units: MapUnit and ReduceUnit MapUnit loads a partition of data and sends messages to all ReduceUnits ReduceUnit gets its input from the DFS. The locations of the input are obtained from the messages of MapUnits.

9 Using epiC to simulate Relational DB
Three Units are created: SingleTableUnit: Handles all processings on a single Table JoinUnit: Joins two or more tables AggregatUnit: applies the group by operator and computes the aggregation results

10 Using epiC to simulate Relational DB
Example: TPC-H Q3 5 steps are required

11 TPC-H Q3 (Step 1)

12 TPC-H Q3 (Step 2 and 3)

13 TPC-H Q3 (Step 4 and 5)


Download ppt "epiC: an Extensible and Scalable System for Processing Big Data"

Similar presentations


Ads by Google