Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China www.jiahenglu.net.

Similar presentations


Presentation on theme: "Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China www.jiahenglu.net."— Presentation transcript:

1 Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China www.jiahenglu.net

2 Chaining Mapreduce 2 Table join and sorting need two Mapreduce jobs mapreduce-1 | mapreduce-2 | mapreduce-3 |... JobClient.runJob()

3 Chaining MapReduce jobs with complex dependency 3 Mapreduce1 may process one data set, while mapreduce2 independently processes another data set. The third job, mapreduce3, performs an inner join of the first two jobs’ output.

4 Chaining MapReduce jobs with complex dependency Hadoop has a mechanism to simplify the management of such (nonlinear) job dependencies via the Job and JobControl classes. For Job objects x and y, x.addDependingJob(y)

5 Chaining preprocessing and postprocessing steps MAP+ | REDUCE | MAP*

6 Driver for chaining mappers within a MapReduce job Configuration conf = getConf(); JobConf job = new JobConf(conf); job.setJobName("ChainJob"); job.setInputFormat(TextInputFormat.class); job.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(job, in); FileOutputFormat.setOutputPath(job, out); JobConf map1Conf = new JobConf(false); ChainMapper.addMapper(job,Map1.class,LongWritable. class,Text.class,Text.class,Text.class, true,map1Conf); JobConf map2Conf = new JobConf(false); ChainMapper.addMapper(job,Map2.class,Text.class,Tex t.class,LongWritable.class,Text.class,true,map2Conf);

7 Driver for chaining mappers within a MapReduce job JobConf reduceConf = new JobConf(false); ChainReducer.setReducer(job,Reduce.class,LongWrita ble.class,Text.class,Text.class,Text.class,true,reduceC onf); JobConf map3Conf = new JobConf(false); ChainReducer.addMapper(job,Map3.class,Text.class,T ext.class,LongWritable.class, Text.class,true,map3Conf); JobConf map4Conf = new JobConf(false); ChainReducer.addMapper(job,Map4.class,LongWritabl e.class,Text.class,LongWritable.class,Text.class,true,m ap4Conf); JobClient.runJob(job);

8 Driver for chaining mappers within a MapReduce job public static void addMapper(JobConf job, klass, inputKeyClass, inputValueClass, outputKeyClass, outputValueClass, boolean byValue, JobConf mapperConf)

9 DistributedCache DistributedCache.addCacheFile() to specify the files to be disseminated to all nodes DistributedCache.getLocalCacheFiles()

10 DistributedCache public int run(String[] args) throws Exception { Configuration conf = getConf(); JobConf job = new JobConf(conf, DataJoinDC.class); DistributedCache.addCacheFile(new Path(args[0]).toUri(), conf); Path in = new Path(args[1]); Path out = new Path(args[2]); FileInputFormat.setInputPaths(job, in); FileOutputFormat.setOutputPath(job, out);


Download ppt "Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China www.jiahenglu.net."

Similar presentations


Ads by Google