Presentation on theme: "epiC: an Extensible and Scalable System for Processing Big Data"— Presentation transcript:
1 epiC: an Extensible and Scalable System for Processing Big Data
2 Why we need another new MapReduce-like system? M/R framework cannot handle iterative processing efficientlyEverything needs to be transformed into map and reduce functionsPregel/GraphLab/DryadDAG based data flowUser should design how the graph is constructed and how different operators are linkedCan we combine the advantages of both types of systems?
3 Overview of epiC Unit works independently Units communicate via “ ”Master works as mail server to forward the messagesepiC is based on the actor-model
4 Compare epiC to MapReduce and Pregel Using PageRank as an exampleMapReduce:Multi-iterationsThe second job loads the output of the first job to continue the processing
5 Compare epiC to MapReduce and Pregel In each super-step, the vertex computes its new PageRank values and broadcasts the value to its neighbors
6 Compare epiC to MapReduce and Pregel 0. send messages to unit to activate it1. Unit loads a partition of graph data and score vector based on the received message2. compute new score vector of vertices3. generate new score vector files4. send messages to master network
7 Compare epiC to MapReduce and Pregel Flexibility:MR is not designed for such job. Pregel and epiC can express the algorithm more effectively. Unit in epiC is equivalent to the worker of PregelOptimization:Both MR and epiC supports customized optimization, e.g., buffering the intermediate results in local diskExtensibility:MR and Pregel have their pre-defined programming model, while in epiC, users can create their own.
8 Using epiC to simulate MR Create two basic units: MapUnit and ReduceUnitMapUnit loads a partition of data and sends messages to all ReduceUnitsReduceUnit gets its input from the DFS. The locations of the input are obtained from the messages of MapUnits.
9 Using epiC to simulate Relational DB Three Units are created:SingleTableUnit: Handles all processings on a single TableJoinUnit: Joins two or more tablesAggregatUnit: applies the group by operator and computes the aggregation results
10 Using epiC to simulate Relational DB Example: TPC-H Q35 steps are required