Presentation is loading. Please wait.

Presentation is loading. Please wait.

EpiC: an Extensible and Scalable System for Processing Big Data Dawei Jiang, Gang Chen, Beng Chin Ooi, Kian Lee Tan, Sai Wu School of Computing, National.

Similar presentations


Presentation on theme: "EpiC: an Extensible and Scalable System for Processing Big Data Dawei Jiang, Gang Chen, Beng Chin Ooi, Kian Lee Tan, Sai Wu School of Computing, National."— Presentation transcript:

1 epiC: an Extensible and Scalable System for Processing Big Data Dawei Jiang, Gang Chen, Beng Chin Ooi, Kian Lee Tan, Sai Wu School of Computing, National University of Singapore College of Computer Science and Technology, Zhejiang University

2 Why we need another new MapReduce-like system? MapReduce  M/R framework cannot handle iterative processing efficiently  Everything needs to be transformed into map and reduce functions Pregel/GraphLab/Dryad  DAG based data flow  User should design how the graph is constructed and how different operators are linked Can we combine the advantages of both types of systems?

3 Overview of epiC Unit works independently Units communicate via “email” Master works as mail server to forward the messages epiC is based on the actor- model

4 Compare epiC to MapReduce and Pregel Using PageRank as an example MapReduce:  Multi-iterations  The second job loads the output of the first job to continue the processing

5 Compare epiC to MapReduce and Pregel Pregel:  In each super-step, the vertex computes its new PageRank values and broadcasts the value to its neighbors

6 Compare epiC to MapReduce and Pregel epiC  0. send messages to unit to activate it  1. Unit loads a partition of graph data and score vector based on the received message  2. compute new score vector of vertices  3. generate new score vector files  4. send messages to master network

7 Compare epiC to MapReduce and Pregel Flexibility:  MR is not designed for such job. Pregel and epiC can express the algorithm more effectively. Unit in epiC is equivalent to the worker of Pregel Optimization:  Both MR and epiC supports customized optimization, e.g., buffering the intermediate results in local disk Extensibility:  MR and Pregel have their pre-defined programming model, while in epiC, users can create their own.

8 Using epiC to simulate MR Create two basic units: MapUnit and ReduceUnit MapUnit loads a partition of data and sends messages to all ReduceUnits ReduceUnit gets its input from the DFS. The locations of the input are obtained from the messages of MapUnits.

9 Using epiC to simulate Relational DB Three Units are created:  SingleTableUnit: Handles all processings on a single Table  JoinUnit: Joins two or more tables  AggregatUnit: applies the group by operator and computes the aggregation results

10 Using epiC to simulate Relational DB Example: TPC-H Q3 5 steps are required

11 TPC-H Q3 (Step 1)

12 TPC-H Q3 (Step 2 and 3)

13 TPC-H Q3 (Step 4 and 5)

14 Implementation Details

15 Fault Tolerance

16 EXPERIMENTS 65 nodes quad-core Intel Xeon 2.4GHz CPU, 8GB memory and two 500GB SCSI disks connected by a 10 Gbps cluster switch

17 System Settings Hadoop settings epiC settings GPS settings

18 Benchmark Tasks and Datasets Grep TeraSort TPC-H Q3 PageRank

19 The Grep Task

20 The TeraSort Task

21 The TPCH Q3 Task

22 The PageRank Task

23 Comparison with Inmemory Systems


Download ppt "EpiC: an Extensible and Scalable System for Processing Big Data Dawei Jiang, Gang Chen, Beng Chin Ooi, Kian Lee Tan, Sai Wu School of Computing, National."

Similar presentations


Ads by Google