RESTORE IMPLEMENTATION as an extension to pig Vijay S.

RESTORE IMPLEMENTATION as an extension to pig Vijay S

LOGO www.nordridesign.com Overview of Pig Query Compiler Implementation of Restore Experiments Outline

LOGO www.nordridesign.com Overview of the Pig Query Compiler a parser syntactically checks the input query and transforms it into a logical plan, which is a directed acyclic graph (DAG) of logical operators(1) logical optimizer applies optimization rules to this logical plan(2) MapReduce compiler transforms the logical plan into a physical plan and then compiles it into a series of MapReduce jobs, which forms a workflow(3)

LOGO www.nordridesign.com Overview of the Pig Query Compiler - Continued MapReduce optimizer applies rules to reduce the number of MapReduce jobs in the workflow(4) Hadoop job manager submits the jobs in a workflow to Hadoop for execution taking into account the dependencies between them.(5)

LOGO www.nordridesign.com Overview of the Pig Query Compiler - Continued JobControlCompiler component of the Hadoop job manager of Pig Input is Workflow of Mapreduce Jobs After the completion of executing all the MapReduce jobs in the workflow, these intermediate outputs are deleted.

LOGO www.nordridesign.com Implementation of Restore The input of ReStore is a workflow of MapReduce jobs. Every physical plan of these jobs passes though two stages: (1) matching with plans in the repository, and (2) generating candidate sub-jobs..Implement the repository as a table that con-tains in every record: (1) a physical plan of a MapReduce job, (2) the filename of the output of this job in HDFS, and (3) statistics about this job

LOGO www.nordridesign.comExperiments Reusing the Output of Whole Jobs(7.1) Reusing the Output of Sub Jobs(7.2) Comparing the Heuristics for GeneratingCandidate Sub-Jobs(7.3) Reusing Sub Jobs vs. Whole Jobs((7.4) Effect of Data Reduction((7.5)

LOGO www.nordridesign.com Reusing the Output of Whole Jobs(7.1) Job execution time for queries is much reduced by resusing jobs compared to no data reuse.(L3, L11 – PigMix) Example: L2-L8 and L11 (Join, Group, Co- Group,Filter Distinct and Union) L3, L11 - PigMix

LOGO www.nordridesign.com Reusing the Output of sub Jobs(7.2) Job execution time for queries is further reduced by resusing Output of jobs compared to no data reuse and generating sub jobs Example: L2-L8 and L11 (Join, Group, Co- Group,Filter Distinct and Union) L3, L11 - PigMix

LOGO www.nordridesign.com Comparing Heuristics for Generating Candidate subjobs(7.3) Job execution time for queries is further reduced by resusing Output of jobs compared to no data reuse and generating sub jobs Example: L2-L8 and L11 (Join, Group, Co- Group,Filter Distinct and Union) L3, L11 - PigMix

LOGO www.nordridesign.com Comparing the Heuristics for generating candidate Sub-Jobs (7.3) shows total size of Input Data loaded by different queries Q I/P (GB) H C (GB) H A (GB) NH (GB) O/PO/P L2150.63.1 6.71.1 MB L3150.73.28.222.162.9 MB L4150.622.810.834.2 MB L5150.71.84.67.42 B2 B L6150.63.710.124.392.7 MB L7150.62.25.4 1.5 MB L8150.63.3 11.427 B27 B L11L11173.62.62.72.81.6 GB

LOGO www.nordridesign.com Reusing subjobs Vs Whole Jobs(7.4) Field nameCardinality% Selected Data field62000.5% field71001%1% field820205%5% field9101010% field10520% field11250% field121.660%

LOGO www.nordridesign.com Reusing subjobs Vs Whole Jobs(7.4) Overhead and Speed up of different jobs – Dark line is speedup

LOGO www.nordridesign.com Effect of Data Reduction(7.5) Overhead and Speed up of different jobs with filter operators

LOGO www.nordridesign.com Effect of Data Reduction(7.5) Continued Query Template QP A = load ’$synth_data’ as (field1,..., field12); B = foreach A generate field1,...; C = group B by (field1,...); D = foreach C generate COUNT($1); store D into ’$out’;

LOGO www.nordridesign.com Effect of Data Reduction(7.5) Continued Query Template QF A = load ’$synth_data’ as (field1,..., field12); B = filter A by $fieldi = $val ; C = group B by field1; D = foreach C generate COUNT($1); store D into ’$out’; ’;

LOGO www.nordridesign.com Related Work Paper addresses challenges by Mapreduce like massive data sizes and procedural nature of query language Otherwork – Materialized views and Mrshare

RESTORE IMPLEMENTATION as an extension to pig Vijay S.

Similar presentations

Presentation on theme: "RESTORE IMPLEMENTATION as an extension to pig Vijay S."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

RESTORE IMPLEMENTATION as an extension to pig Vijay S.

Similar presentations

Presentation on theme: "RESTORE IMPLEMENTATION as an extension to pig Vijay S."— Presentation transcript:

Similar presentations

About project

Feedback