Presentation is loading. Please wait.

Presentation is loading. Please wait.

YSmart: Yet Another SQL-to-MapReduce Translator Rubao Lee 1, Tian Luo 1, Yin Huai 1, Fusheng Wang 2, Yongqiang He 3, Xiaodong Zhang 1 1 Dept. of Computer.

Similar presentations


Presentation on theme: "YSmart: Yet Another SQL-to-MapReduce Translator Rubao Lee 1, Tian Luo 1, Yin Huai 1, Fusheng Wang 2, Yongqiang He 3, Xiaodong Zhang 1 1 Dept. of Computer."— Presentation transcript:

1 YSmart: Yet Another SQL-to-MapReduce Translator Rubao Lee 1, Tian Luo 1, Yin Huai 1, Fusheng Wang 2, Yongqiang He 3, Xiaodong Zhang 1 1 Dept. of Computer Sci. & Eng., The Ohio State University 2 Center for Comprehensive Informatics, Emory University 3 Data Infrastructure Team, Facebook

2 Data Explosion in Human Society 2 Source: Exabytes: Documenting the 'digital age' and huge growth in computing capacity, The Washington Post Analog Storage Digital Storage Analog billion GB Digital billion GB The global storage capacityThe amount of digital information created and replicated in a year

3 Challenges of Big Data Management and Analytics  Existing DB technology is not prepared for such huge volumes Storage and system scale (e.g. 70TB/day and 3000 nodes in Facebook) Many applications in science, medical and business follow the same trend  Big data demands more complex analytics than data transactions Data mining, time series analysis and etc. gain deep insights  The conventional DB business model is not affordable Software license, maintain fees, $10,000/TB *  The conventional DB processing model is to “scale up” By updating CPU/memory/storage/networks (HPC model) The big data processing model is to “scale-out”: continuously adding low cost computing and storage nodes in a distributed way 3 *: The MapReduce (MR) programming model becomes an effective data processing engine for big data analytics Hadoop is the most widely used implementation of MapReduce in hundreds of society-dependent corporations/organizations for big data analytics: AOL, Baidu, EBay, Facebook, IBM, NY Times, Yahoo! ….

4 MapReduce Overview  A simple but effective programming model designed to process huge volumes of data concurrently on large-scale clusters  Key/value pair transitions  Map: (k1, v1)  (k2, v2)  Reduce: (k2, a list of v2)  (k3, v3)  Shuffle: Partition Key (It could be the same as k2, or not) Partition Key: to determine how a key/value pair in the map output would be transferred to a reduce task 4

5 5 MR(Hadoop) Job Execution Overview MR program (job) Master node 1: Job submission Worker nodes 2: Assign Tasks Map Tasks Reduce Tasks Data is stored in a distributed file system (e.g. Hadoop Distributed File System) Control level work, e.g. job scheduling and task assignment Do data processing work specified by Map or Reduce Function The execution of a MR job involves 6 steps

6 6 MR(Hadoop) Job Execution Overview MR program Master node 1: Job submission Worker nodes 3: Map phase Concurrent tasks Map Tasks Reduce Tasks 4: Shuffle phase Map output Map output will be shuffled to different reduce tasks based on Partition Keys (PKs) (usually Map output keys) The execution of a MR job involves 6 steps

7 7 MR(Hadoop) Job Execution Overview MR program Master node 1: Job submission Worker nodes Map Tasks Reduce Tasks 5: Reduce phase Concurrent tasks 6: Output will be stored back to the distributed file system Reduce output 3: Map phase Concurrent tasks 4: Shuffle phase The execution of a MR job involves 6 steps

8 8 MR(Hadoop) Job Execution Overview MR program Master node 1: Job submission Worker nodes Map Tasks Reduce Tasks 5: Reduce phase Concurrent tasks 6: Output will be stored back to Distributed File System Reduce output 3: Map phase Concurrent tasks 4: Shuffle phase The execution of a MR job involves 6 steps A MapReduce (MR) job is resource-consuming: 1: Input data scan in the Map phase => local or remote I/Os 1: Store intermediate results of Map output => local I/Os 3: Transfer data across in the Shuffle phase => network costs 4: Store final results of this MR job => local I/Os + network costs (replicate data)

9 9 package tpch; import java.io.IOException; import java.util.ArrayList; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.DoubleWritable; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.Mapper.Context; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.FileSplit; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class Q18Job1 extends Configured implements Tool{ public static class Map extends Mapper { private final static Text value = new Text(); private IntWritable word = new IntWritable(); private String inputFile; private boolean isLineitem = protected void setup(Context context ) throws IOException, InterruptedException { inputFile = ((FileSplit)context.getInputSplit()).getPath().getName(); if (inputFile.compareTo("lineitem.tbl") == 0){ isLineitem = true; } System.out.println("isLineitem:" + isLineitem + " inputFile:" + inputFile); } public void map(Object key, Text line, Context context ) throws IOException, InterruptedException { String[] tokens = (line.toString()).split("\\|"); if (isLineitem){ word.set(Integer.valueOf(tokens[0])); value.set(tokens[4] + "|l"); context.write(word, value); } else{ word.set(Integer.valueOf(tokens[0])); value.set(tokens[1] + "|" + tokens[4]+"|"+tokens[3]+"|o"); context.write(word, value); } public static class Reduce extends Reducer { private Text result = new Text(); public void reduce(IntWritable key, Iterable values, Context context ) throws IOException, InterruptedException { double sumQuantity = 0.0; IntWritable newKey = new IntWritable(); boolean isDiscard = true; String thisValue = new String(); int thisKey = 0; for (Text val : values) { String[] tokens = val.toString().split("\\|"); if (tokens[tokens.length - 1].compareTo("l") == 0){ sumQuantity += Double.parseDouble(tokens[0]); } else if (tokens[tokens.length - 1].compareTo("o") == 0){ thisKey = Integer.valueOf(tokens[0]); thisValue = key.toString() + "|" + tokens[1]+"|"+tokens[2]; } else continue; } if (sumQuantity > 314){ isDiscard = false; } if (!isDiscard){ thisValue = thisValue + "|" + sumQuantity; newKey.set(thisKey); result.set(thisValue); context.write(newKey, result); } public int run(String[] args) throws Exception { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); if (otherArgs.length != 3) { System.err.println("Usage: Q18Job1 "); System.exit(2); } Job job = new Job(conf, "TPC-H Q18 Job1"); job.setJarByClass(Q18Job1.class); job.setMapperClass(Map.class); job.setMapOutputKeyClass(IntWritable.class); job.setMapOutputValueClass(Text.class); job.setReducerClass(Reduce.class); job.setOutputKeyClass(IntWritable.class); job.setOutputValueClass(Text.class); FileInputFormat.addInputPath(job, new Path(otherArgs[0])); FileInputFormat.addInputPath(job, new Path(otherArgs[1])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[2])); return (job.waitForCompletion(true) ? 0 : 1); } public static void main(String[] args) throws Exception { int res = ToolRunner.run(new Configuration(), new Q18Job1(), args); System.exit(res); } MR programming is not that “simple”! This complex code is for a simple MR job Do you miss some thing like … “SELECT * FROM Book WHERE price > ”? Low Productivity!

10 High-Level Programming and Data Processing Environment on top of Hadoop 10 Hadoop Distributed File System (HDFS) Workers A job description in SQL-like declarative language SQL-to-MapReduce Translator MR programs (jobs) Write MR programs (jobs) An interface between users and MR programs (jobs)

11 High-Level Programming and Data Processing Environment on top of Hadoop 11 Hadoop Distributed File System (HDFS) Workers A job description in SQL-like declarative language SQL-to-MapReduce Translator A MR program (job) Write MR programs (jobs) Improve productivity from hand-coding MapReduce programs 95%+ Hadoop jobs in Facebook are generated by Hive 75%+ Hadoop jobs in Yahoo! are invoked by Pig * A data warehousing system (Facebook) A high-level programming environment (Yahoo!) * A interface between users and MR programs (jobs)

12 Translating SQL-like Queries to MapReduce Jobs: Existing Approach  “Sentence by sentence” translation [C. Olston et al. SIGMOD 2008], [A. Gates et al., VLDB 2009] and [A. Thusoo et al., ICDE2010] Implementation: Hive and Pig  Three steps Identify major sentences with operations that shuffle the data – Such as: Join, Group by and Order by For every operation in the major sentence, a corresponding MR job is generated – e.g. a join op. => a join MR job Add other operations, such as selection and projection, into corresponding MR jobs 12 Can existing SQL-to-MapReduce translators also provide comparable performance with optimized MR job(s)?

13 13 An Example: TPC-H Q21  One of the most complex and time-consuming queries in the TPC-H benchmark for data warehousing performance  Optimized MR Jobs vs. Hive in a Facebook production cluster 3.7x What’s wrong?

14 14 The Execution Plan of TPC-H Q21 lineitemorders Join1 Join2 AGG1 Join4 AGG3 lineitem AGG2 lineitem Left-outer- Join suppliernation Join3 SORT It’s the dominated part on time (~90% of execution time) Hive handle this sub-tree in a different way from our optimized MR jobs

15 15 lineitemorders J1 J3 lineitem J5 A JOIN MR Job A Table Key: l_orderkey Let’s look at the Partition Key J1, J2 and J4 all need the input table ‘lineitem’J1 to J5 all use the same partition key ‘l_orderkey’ lineitemorders J2 J4 An AGG MR Job A Composite MR Job However, inter-job correlations exist. What’s wrong with existing SQL-to-MR translators? Existing translators are correlation-unaware 1.Ignore common data input 2.Ignore common data transition 3.Add unnecessary data re-partition

16 16 Productivity Performance Hand-coding MR jobs Existing SQL-to-MR Translators Approaches of Big Data Analytics in MR: The landscape Pro: high performance MR programs Con: 1: lots of coding even for a simple job 2: Redundant coding is inevitable 3: Hard to debug [J. Tan et al., ICDCS 2010] Pro: Easy programming, high productivity Con: Poor performance on complex queries (complex queries are usual in daily operations) Correlation- aware SQL-to- MR translator

17 Outline  Background and Motivation  Our Solution: YSmart, a correlation-aware SQL-to- MapReduce translator  Performance Evaluation  YSmart in Hive  Conclusion 17

18 18 Correlation-aware SQL-to-MR translator Our Approaches and Critical Challenges Primitive MR Jobs Identify Correlations Merge Correlated MR jobs SQL-like queries 1: Correlation possibilities and detection 2: Rules for automatically exploiting correlations 3: Implement high-performance and low-overhead MR jobs MR Jobs for best performance

19 Primitive MR Jobs  Selection-Projection (SP) MR Job: Simple query with only selection and/or projection  Aggregation (AGG) MR Job: Group input relation and apply aggregation functions e.g. SELECT count(*) FROM faculty GROUP BY age. Partition key can be any column(s) in the GROUP BY clause  Join (JOIN) MR Job: For an equi-join on two relations (inner or left/right/full outer join) e.g. SELECT * FROM faculty LEFT OUTER JOIN awards ON … Partition key is in the JOIN predicate  Sort (SORT) MR Job: Sort input relation Usually the last step in a complex query 19

20 Input Correlation (IC)  Multiple MR jobs have input correlation (IC) if their input relation sets are not disjoint 20 Map Func. of MR Job 1 Map Func. of MR Job 2 A shared input relation set lineitemorders J1 lineitem J2

21 Transit Correlation (TC)  Multiple MR jobs have transit correlation (TC) if they have input correlation (IC), and they have the same Partition Key 21 Map Func. of MR Job 1 Map Func. of MR Job 2 A shared input relation set Partition Key Other Data Two MR jobs should first have IC lineitemorders J1 lineitem J2 Key: l_orderkey

22 J1J2 Job Flow Correlation (JFC)  A MR job has Job Flow Correlation (JFC) with one of its child MR jobs if it has the same partition key as that MR job 22 Map Func. of MR Job 1 Map Func. of MR Job 2 Partition Key Other Data Reduce Func. of MR Job 1 Reduce Func. of MR Job 2 Output of MR Job 2 lineitemorders J1 J2

23 Query Optimization Rules for Automatically Exploiting Correlations  JOB-Merging: Exploiting both Input Correlation and Transit Correlation  AGG-Pushdown: Exploiting the Job Flow Correlation associated with Aggregation jobs  JOIN-Merging-Full: Exploiting the Job Flow Correlation associated with JOIN jobs and their Transit Correlated parents jobs  JOIN-Merging-Half: Exploiting the Job Flow Correlation associated with JOIN jobs 23

24 24 Rule 1: JOB MERGING Condition: Job 1 and Job2 have both IC and TC Action: Merge Map/Reduce Func. of Job 1 and Job 2 into a Single Map/Reduce Func. Job 1 Map Func. Job 2 Map Func. Job 1 Reduce Func. Job 2 Reduce Func. Reduce Func. of Merged Job Do all the work of Job 1 Reduce Func., Job 2 Reduce Func.. Generate two outputs Map Func. of Merged Job lineitemorders J1 lineitem J2 lineitemorders

25 AGG Reduce Func. 25 Rule 2: AGG-Pushdown Condition: AGG Job and its parent Job 1 have JFC Action: Merge Job 1 Reduce Func., Job 2 Reduce Func. and JOIN Job into a single Reduce Func. Job 1 Map Func. Job 1 Reduce Func. AGG Map Func. Do all the work of Job 1 Reduce Func., AGG Map Func. and AGG Reduce Func.. Reduce Func. of Merged Job lineitemorders J1 J2 lineitemorders

26 26 Rule 3:Join-Merging-Full Job 1 Map Func. Job 2 Map Func. Job 1 Reduce Func. Job 2 Reduce Func. JOIN Map Func. JOIN Reduce Func. Condition 1: Job 1 and Job 2 have Transit Correlation (TC) Condition 2: JOIN Job have JFC with Job 1 and Job 2 Reduce Func. of Merged Job Do all the work of Job 1 Reduce Func., Job 2 Reduce Func., JOIN Map Func. and JOIN Reduce Func. Action: Merge Job 1 Reduce Func., Job 2 Reduce Func. and JOIN Job into a single Reduce Func. Map Func. of Merged Job lineitemorders J1 J3 lineitem J2 26 lineitemorders

27 Action: Merge the Reduce Func. of Job and that of JOIN into a single Reduce Func. 27 Rule 4:Join-Merging-Half Condition: Job 2 and JOIN Job have JFC Job 1 Map Func. Job 2 Map Func. Job 1 Reduce Func. Job 2 Reduce Func. JOIN Map Func. JOIN Reduce Task JOIN Map Func. Reduce Func. of Merged Job Data from Job 1: do the work of Reduce Func. of JOIN JOIN Map Func. Data from Job 2: Do all the work of Job 2 Reduce Func., JOIN Map Func. and JOIN Reduce Func. Job 1 must be executed before the merged Job Must consider data dependency

28 The Common MapReduce Framework  Aim at providing a framework for executing merged tasks in low overhead To minimize size of intermediate data To minmize data access in reduce phase  Common Map Func. and Common Reduce Func. Common Map Func.: Tag a record with Job-IDs it does not belong to – Less intermediate results Common Reduce Func.: Dispatch key/value pairs => post-job computations  Refer our paper for details 28

29 Outline  Background and Motivation  Our Solution: YSmart, a correlation-aware SQL-to- MapReduce translator  Performance Evaluation  YSmart in Hive  Conclusion 29

30 30 Exp1: Four Cases of TPC-H Q21 1: Sentence-to-Sentence Translation 5 MR jobs lineitemorderslineitem Join2 AGG1AGG2 Left- outer-Join Join1 2: InputCorrelation+TransitCorrelation 3 MR jobs 3: InputCorrelation+TransitCorrelation+ JobFlowCorrelation 1 MR job 4: Hand-coding (similar with Case 3) In reduce function, we optimize code according to the query semantic lineitemorderslineitemorders lineitemorders Join2 Left-outer- Join

31 31 Breakdowns of Execution Time (sec) No CorrelationInput Correlation Transit Correlation Input Correlation Transit Correlation JobFlow Correlation Hand-Coding Only 17% difference From totally 888sec to 510sec From totally 768sec to 567sec

32 Exp2: Clickstream Analysis 32 A typical query in production clickstream analysis: “what is the average number of pages a user visits between a page in category ‘X’ and a page in category ‘Y’?” In YSmart JOIN1, AGG1, AGG2, JOIN2 and AGG3 are executed in a single MR job 4.8x 8.4x

33 YSmart in the Hadoop Ecosystem 33 Hadoop Distributed File System (HDFS) YSmart See patch HIVE-2206 at apache.org Hive + YSmart

34 Conclusion  YSmart is a correlation-aware SQL-to-MapReduce translator  Ysmart can outperform Hive by 4.8x, and Pig by 8.4x  YSmart is being integrated into Hive  The individual version of YSmart will be released soon 34

35 35

36 Backup Slides

37 Performance Evaluation  Exp 1. How can correlations affect query execution performance? In a local small cluster, the worker node is a quad-core machine Workload: TPC-H Q21, 10GB dataset  Exp 2. How YSmart can outperformce Hive and Pig? Amazon EC2 cluster Workload: Click-stream, 20GB Compare the performance on execution time among YSmart, Hive and Pig 37

38 Why MapReduce is effective data processing engine for big data analytics?  Two unique properties Minimum dependency among tasks (almost sharing nothing) Simple task operations in each node (low cost machines are sufficient)  Two strong merits for big data analytics Scalability (Amadal’s Law): increase throughput by increasing # of nodes Fault-tolerance (quick and low cost recovery of the failures of tasks)  Hadoop is the most widely used implementation of MapReduce in hundreds of society-dependent corporations/organizations for big data analytics: AOL, Baidu, EBay, Facebook, IBM, NY Times, Yahoo! …. 38

39 An Example 39 A typical question in click stream analysis: “what is the average number of pages a user visits between a page in category ‘ Travel ’ and a page in category ‘ Shopping ’?”

40 An Example 40 A typical question in click stream analysis: “what is the average number of pages a user visits between a page in category ‘ Travel ’ and a page in category ‘ Shopping ’?” 3 steps: 1: Identify the time window for every user-visit path from a page in in category ‘ Travel ’ and a page in category ‘ Shopping ’; 2: Count the number of pages in each time window (exclude start and end pages); 3: Calculate the average number of counts generated by step 2.

41 41 Step 1: Identify the time window for every user-visit path from a page in in category ‘ Travel ’ and a page in category ‘ Shopping ’; Clicks (C1) Clicks (C2) Join1 Step 1-1: For each user, identify pairs of clicks from category ‘Travel’ to ‘Shopping’ C1.user=C2.user

42 42 Clicks (C1) Clicks (C2) Join1 C1.user=C2.user :05: :05: :07: :07:20

43 43 Clicks (C1) Clicks (C2) Join1 AGG1 AGG2 C1.user=C2.user Group by user, TS_Travel Step 1-2: Find the end TS for every time window Step 1-3: Find the start TS for every time window Group by user, TS_End

44 44 Clicks (C1) Clicks (C2) Join1 AGG1 AGG2 Clicks (C3) Join2 Step 2: Count the number of pages in each time window (exclude start and end pages); Step 2-1: Identify the clicks in each time window user=C3.user C1.user=C2.user Group by user, TS_Travel Group by user, TS_End

45 45 Clicks (C1) Clicks (C2) Join1 AGG1 AGG2 Clicks (C3) Join2 AGG3 Step 2-2: Count the number of pages in each time window (exclude start and end pages); Group by user, TS_Start user=C3.user C1.user=C2.user Group by user, TS_Travel Group by user, TS_End

46 46 Clicks (C1) Clicks (C2) Join1 AGG1 AGG2 Clicks (C3) Join2 AGG3 AGG4 Step 3: Calculate the average number of counts generated by step 2. the average number of pages a user visits between a page in category ‘Travel’ and a page in category ‘Shopping’ user=C3.user C1.user=C2.user Group by user, TS_Travel Group by user, TS_Start Group by user, TS_End

47 47 Clicks (C1) Clicks (C2) Join1 AGG1 AGG2 Clicks (C3) Join2 AGG3 AGG4 user=C3.user C1.user=C2.user Group by user, TS_Travel Group by user, TS_Start Group by user, TS_End ~3x Join1 AGG1 AGG2 Join2 AGG3 2 MR jobs 6 MR jobs These 5 MR jobs are correlated (sharing the same Partition Key), thus can be merged into 1 MR job

48 48 MR Job Execution Overview MR program Master node 1: Job submission Worker nodes Map Tasks Reduce Tasks 5: Reduce phase Concurrent tasks 6: Output will be stored back to Distributed File System Reduce output 3: Map phase Concurrent tasks 4: Shuffle phase The execution of a MR job involves 6 steps A MapReduce (MR) job is resources consuming: 1: Job-scheduling and task-startup costs are nontrivial; 2: Intermediate results of Map output will traverse the “slow” network; 3: Potential remote data access of Map tasks. Hadoop is the most widely used implementation of MapReduce in hundreds of society-dependent corporations/organizations for big data analytics: AOL, Baidu, EBay, Facebook, IBM, NY Times, Yahoo! ….

49 49 Map Func. of MR Job 1 Map Func. of MR Job 2 Reduce Func. of MR Job 1 Reduce Func. of MR Job 2 Rule 1: Exploiting the Transit Correlation (TC) Condition: Job 1 and Job2 have TC Partition Key Other Data Job 1 output Job 2 output Reduce Func. of Merged Job Action: Merge Reduce Func. of Job 1 and Job 2 into a Single Reduce Func. Job 1Job 2

50 50 Rule 2: Exploiting the Job Flow Correlation (JFC) associated with Aggregation (AGG) jobs Condition: AGG Job and its parent Job 1 have JFC MR Job 1AGG Job Map Func. of MR Job 1 Map Func. of AGG Job Partition Key Other Data Reduce Func. of MR Job 1 Reduce Func. of AGG Job Output of AGG Job

51 Merged Job 51 Rule 2: Exploiting the Job Flow Correlation (JFC) associated with Aggregation (AGG) jobs Condition: AGG Job and its parent Job1 have JFC Action: Merge Map and Reduce Func. of AGG into the Reduce Func. of Job 1 Map Func. of Merged Job Partition Key Other Data Reduce Func. of Merged Job Output of AGG Job

52 Exp2: Evaluation Environment 52  Amazon EC2 cluster 11 nodes 20GB click-stream dataset Compare the performance on execution time among YSmart, Hive and Pig  Facebook cluster 747 nodes 8 cores per node 1 TB data set of TPC-H benchmark Compare the performance on execution time between YSmart and Hive Every query were executed three times for both YSmart and Hive

53 Amazon EC2 11-node cluster 53 Query: “what is the average number of pages a user visits between a page in category ‘X’ and a page in category ‘Y’?” In YSmart JOIN1, AGG1, AGG2, JOIN2 and AGG3 are executed in a single MR job 4.8x 8.4x

54 Facebook Cluster 54 Query: TPC-H Suppliers Who Kept Orders Waiting Query (Q21) 3.1x 3.7x 3.3x

55 Data Explosion in Human Society 55 Source: Exabytes: Documenting the 'digital age' and huge growth in computing capacity, The Washington Post Analog Storage Digital Storage 1986 Analog 2.62 billion GB Digital 0.02 billion GB 2007 Analog billion GB Digital billion GB PC hard disks 123 billion GB 44.5% The global storage capacityThe amount of digital information created and replicated in a year

56 Challenges of Big Data Management and Analytics  Existing DB technology is not prepared for such huge volumes Storage and system scale (e.g. 70TB/day and 3000 nodes in Facebook) Many applications in science, medical and business follow the same trend  Big data demands more complex analytics than data transactions Data mining, time series analysis and etc. gain deep insights  The conventional DB business model is not affordable Software license, maintain fees, $10,000/TB *  The conventional DB processing model is to “scale up” By updating CPU/memory/storage/networks (HPC model) The big data processing model is to “scale-out”: continuously adding low cost computing and storage nodes in a distributed way 56 *: The MapReduce (MR) programming model becomes an effective data processing engine for big data analytics


Download ppt "YSmart: Yet Another SQL-to-MapReduce Translator Rubao Lee 1, Tian Luo 1, Yin Huai 1, Fusheng Wang 2, Yongqiang He 3, Xiaodong Zhang 1 1 Dept. of Computer."

Similar presentations


Ads by Google