Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Hadoop Overview. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A.

Similar presentations


Presentation on theme: "A Hadoop Overview. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A."— Presentation transcript:

1 A Hadoop Overview

2 Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A

3 Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A

4 Progress Hadoop buildup has been completed.  Version 0.19.0, running under Standalone mode. HBase buildup has been completed.  Version 0.19.3, with no assists of HDFS. Simple demonstration over MapReduce.  Simple word count program.

5 Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A

6 Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A

7 Hadoop Full name Apache Hadoop project.  Open source implementation for reliable, scalable distributed computing.  An aggregation of the following projects (and its core):  Avro  Chukwa  HBase  HDFS  Hive  MapReduce  Pig  ZooKeeper

8 Virtual Machine (VM) Virtualization  All services are delivered through VMs.  Allows for dynamically configuring and managing.  There can be multiple VMs running on a single commodity machine.  VMware

9 HDFS(Hadoop Distributed File System) The highly scalable distributed file system of Hadoop.  Resembles Google File System(GFS).  Provides reliability by replication. NameNode & DataNode  NameNode  Maintains file system metadata and namespace.  Provides management and control services.  Usually one instance.  DataNode  Provides data storage and retrieval services.  Usually several instances.

10 MapReduce The sophisticate distributed computing service of Hadoop.  A computation framework.  Usually resides on HDFS. JobTracker & TaskTracker  JobTracker  Manages the distribution of tasks to the TaskTrackers.  Provides job monitoring and control, and the submission of jobs.  TaskTracker  Manages single map or reduce tasks on a compute node.

11 Cluster Makeup A Hadoop cluster is usually make up by:  Real Machines.  Not required to be homogeneous.  Homogeneity will help maintainability.  Server Process.  Multiple process can be run on a single VM. Master & Slave  The node/machine running the JobTracker or NameNode will be Master node.  The ones running the TaskTracker or DataNode will be Slave node.

12 Cluster Makeup(cont.)

13 Administrator Scripts Administrator can use the following script files to start or stop server processes.  Can be located in $HADOOP_HOME/bin  Start-all.sh/stop-all.sh  Start-mapred.sh/stop-mapred.sh  Start-dfs.sh/stop-dfs.sh  Slaves.sh  hadoop

14 Configuration By default, each Hadoop Core server will load the configuration from several files.  These file will be located in $HADOOP_HOME/conf  Usually identical copies of those files are maintained in every machine in the cluster.

15 Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A

16 Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A

17 Any question?


Download ppt "A Hadoop Overview. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A."

Similar presentations


Ads by Google