epiC: an Extensible and Scalable System for Processing Big Data
Why we need another new MapReduce-like system? M/R framework cannot handle iterative processing efficiently Everything needs to be transformed into map and reduce functions Pregel/GraphLab/Dryad DAG based data flow User should design how the graph is constructed and how different operators are linked Can we combine the advantages of both types of systems?
Overview of epiC Unit works independently Units communicate via “email” Master works as mail server to forward the messages epiC is based on the actor-model
Compare epiC to MapReduce and Pregel Using PageRank as an example MapReduce: Multi-iterations The second job loads the output of the first job to continue the processing
Compare epiC to MapReduce and Pregel In each super-step, the vertex computes its new PageRank values and broadcasts the value to its neighbors
Compare epiC to MapReduce and Pregel 0. send messages to unit to activate it 1. Unit loads a partition of graph data and score vector based on the received message 2. compute new score vector of vertices 3. generate new score vector files 4. send messages to master network
Compare epiC to MapReduce and Pregel Flexibility: MR is not designed for such job. Pregel and epiC can express the algorithm more effectively. Unit in epiC is equivalent to the worker of Pregel Optimization: Both MR and epiC supports customized optimization, e.g., buffering the intermediate results in local disk Extensibility: MR and Pregel have their pre-defined programming model, while in epiC, users can create their own.
Using epiC to simulate MR Create two basic units: MapUnit and ReduceUnit MapUnit loads a partition of data and sends messages to all ReduceUnits ReduceUnit gets its input from the DFS. The locations of the input are obtained from the messages of MapUnits.
Using epiC to simulate Relational DB Three Units are created: SingleTableUnit: Handles all processings on a single Table JoinUnit: Joins two or more tables AggregatUnit: applies the group by operator and computes the aggregation results
Using epiC to simulate Relational DB Example: TPC-H Q3 5 steps are required
TPC-H Q3 (Step 1)
TPC-H Q3 (Step 2 and 3)
TPC-H Q3 (Step 4 and 5)