Presentation on theme: "epiC: an Extensible and Scalable System for Processing Big Data"— Presentation transcript:
1epiC: an Extensible and Scalable System for Processing Big Data
2Why we need another new MapReduce-like system? M/R framework cannot handle iterative processing efficientlyEverything needs to be transformed into map and reduce functionsPregel/GraphLab/DryadDAG based data flowUser should design how the graph is constructed and how different operators are linkedCan we combine the advantages of both types of systems?
3Overview of epiC Unit works independently Units communicate via “ ”Master works as mail server to forward the messagesepiC is based on the actor-model
4Compare epiC to MapReduce and Pregel Using PageRank as an exampleMapReduce:Multi-iterationsThe second job loads the output of the first job to continue the processing
5Compare epiC to MapReduce and Pregel In each super-step, the vertex computes its new PageRank values and broadcasts the value to its neighbors
6Compare epiC to MapReduce and Pregel 0. send messages to unit to activate it1. Unit loads a partition of graph data and score vector based on the received message2. compute new score vector of vertices3. generate new score vector files4. send messages to master network
7Compare epiC to MapReduce and Pregel Flexibility:MR is not designed for such job. Pregel and epiC can express the algorithm more effectively. Unit in epiC is equivalent to the worker of PregelOptimization:Both MR and epiC supports customized optimization, e.g., buffering the intermediate results in local diskExtensibility:MR and Pregel have their pre-defined programming model, while in epiC, users can create their own.
8Using epiC to simulate MR Create two basic units: MapUnit and ReduceUnitMapUnit loads a partition of data and sends messages to all ReduceUnitsReduceUnit gets its input from the DFS. The locations of the input are obtained from the messages of MapUnits.
9Using epiC to simulate Relational DB Three Units are created:SingleTableUnit: Handles all processings on a single TableJoinUnit: Joins two or more tablesAggregatUnit: applies the group by operator and computes the aggregation results
10Using epiC to simulate Relational DB Example: TPC-H Q35 steps are required