Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cloud Computing: Project Tutorial Hadoop Map-Reduce Programming

Similar presentations


Presentation on theme: "Cloud Computing: Project Tutorial Hadoop Map-Reduce Programming"— Presentation transcript:

1 Cloud Computing: Project Tutorial Hadoop Map-Reduce Programming
Wei Zhu Department of Computer Science University of Texas at Dallas

2 Cloud Computing: Project Tutorial Hadoop Map-Reduce Programming
Agenda Map Reduce Environment Configuration Map Reduce Structure Mapper Configuration Combiner Configuration Partitioner Configuration Reducer Configuration Useful Logs 1/15/2019 Cloud Computing: Project Tutorial Hadoop Map-Reduce Programming

3 Map Reduce Environment Configuration
Configuration for Ubuntu ( /etc/hadoop/conf ) hadoop-env.sh log4j.properties where is the log hdfs-site.xml hadoop cluster configuration mapred-site.xml information about the job …… 1/15/2019 Cloud Computing: Project One Tutorial Hadoop Map-Reduce

4 Cloud Computing: Project One Tutorial Hadoop Map-Reduce
Map Reduce Structure Programmers must specify: map (k, v) → <k’, v’>* reduce (k’, v’) → <k’, v’>* All values with the same key are reduced together Optionally, also: partition (k’, number of partitions) → partition for k’ Often a simple hash of the key, e.g., hash(k’) mod n Divides up key space for parallel reduce operations combine (k’, v’) → <k’, v’>* Mini-reducers that run in memory after the map phase Used as an optimization to reduce network traffic 1/15/2019 Cloud Computing: Project One Tutorial Hadoop Map-Reduce

5 Cloud Computing: Project One Tutorial Hadoop Map-Reduce
Map Reduce Structure 1/15/2019 Cloud Computing: Project One Tutorial Hadoop Map-Reduce

6 Cloud Computing: Project One Tutorial Hadoop Map-Reduce
Map Reduce Structure Class Mapper : setup (Mapper.Context context) Called once at the beginning of the task map (k, v) → <k’, v’>* cleanup (Mapper.Context context) Called once at the end of the task. 1/15/2019 Cloud Computing: Project One Tutorial Hadoop Map-Reduce

7 Cloud Computing: Project One Tutorial Hadoop Map-Reduce
Mapper Configuration How many maps? Number of Maps The number of maps is usually driven by the total size of the inputs, that is, the total number of blocks of the input files. The right level of parallelism for maps seems to be around maps per-node setNumMapTasks(int) which only provides a hint to the framework is used to set it even higher. Only existing in an old API JobConf 1/15/2019 Cloud Computing: Project One Tutorial Hadoop Map-Reduce

8 Cloud Computing: Project One Tutorial Hadoop Map-Reduce
Mapper Configuration How many maps? FileInputFormat mapred.max.split.size by setMaxInputSplitSize(Job, long) mapred.min.split.size by setMinInputSplitSize(Job, long) HDFS block: set the size to a smaller value for small data using dfs.block.size in hdfs-site.xml 1/15/2019 Cloud Computing: Project One Tutorial Hadoop Map-Reduce

9 Combiner Configuration
Class Combiner Semi-reducer in mapreduce same interface with Reducer reduce() Process the output of map tasks before submitting to the reducers Works on a single mapper 1/15/2019 Cloud Computing: Project One Tutorial Hadoop Map-Reduce

10 Partitioner Configuration
Class Partitioner Partitioning determining which reducer instance will receive which intermediate keys and values. getPartition()  receives a key and a value and the number of partitions to split the data across 1/15/2019 Cloud Computing: Project One Tutorial Hadoop Map-Reduce

11 Reducer Configuration
Class Reducer: Job. setReducerClass(YourReducer.class) setup * Called once at the beginning of the task reduce (k, v) → <k’, v’>* cleanup * Called once at the end of the task Number of Reducer Job.setNumReduceTasks(int);; 1/15/2019 Cloud Computing: Project One Tutorial Hadoop Map-Reduce

12 Cloud Computing: Project One Tutorial Hadoop Map-Reduce
Useful Logs Node Resource Manager Logs /var/log/hadoop-yarn/yarn yarn-site.xml Application Name, Start date, User name, Hadoop queue, Job outcome (success or failure), Duration, Maximum memory allocated, Percent of cluster used by the job, Details of job executed…… 1/15/2019 Cloud Computing: Project One Tutorial Hadoop Map-Reduce

13 Cloud Computing: Project One Tutorial Hadoop Map-Reduce
Useful Logs Job History Logs /mr-history/done mapred-site.xml These files contain a wealth of performance data on the execution of Mappers and Reducers, including HDFS statistics, data volume processed, memory allocated etc.  1/15/2019 Cloud Computing: Project One Tutorial Hadoop Map-Reduce

14 Cloud Computing: Project One Tutorial Hadoop Map-Reduce
Useful Logs See Logs from GUI. 1/15/2019 Cloud Computing: Project One Tutorial Hadoop Map-Reduce


Download ppt "Cloud Computing: Project Tutorial Hadoop Map-Reduce Programming"

Similar presentations


Ads by Google