A Hadoop Overview. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A.

A Hadoop Overview

Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A

Progress Hadoop buildup has been completed.  Version 0.19.0, running under Standalone mode. HBase buildup has been completed.  Version 0.19.3, with no assists of HDFS. Simple demonstration over MapReduce.  Simple word count program.

Hadoop Full name Apache Hadoop project.  Open source implementation for reliable, scalable distributed computing.  An aggregation of the following projects (and its core):  Avro  Chukwa  HBase  HDFS  Hive  MapReduce  Pig  ZooKeeper

Virtual Machine (VM) Virtualization  All services are delivered through VMs.  Allows for dynamically configuring and managing.  There can be multiple VMs running on a single commodity machine.  VMware

HDFS(Hadoop Distributed File System) The highly scalable distributed file system of Hadoop.  Resembles Google File System(GFS).  Provides reliability by replication. NameNode & DataNode  NameNode  Maintains file system metadata and namespace.  Provides management and control services.  Usually one instance.  DataNode  Provides data storage and retrieval services.  Usually several instances.

MapReduce The sophisticate distributed computing service of Hadoop.  A computation framework.  Usually resides on HDFS. JobTracker & TaskTracker  JobTracker  Manages the distribution of tasks to the TaskTrackers.  Provides job monitoring and control, and the submission of jobs.  TaskTracker  Manages single map or reduce tasks on a compute node.

Cluster Makeup A Hadoop cluster is usually make up by:  Real Machines.  Not required to be homogeneous.  Homogeneity will help maintainability.  Server Process.  Multiple process can be run on a single VM. Master & Slave  The node/machine running the JobTracker or NameNode will be Master node.  The ones running the TaskTracker or DataNode will be Slave node.

Cluster Makeup(cont.)

Administrator Scripts Administrator can use the following script files to start or stop server processes.  Can be located in $HADOOP_HOME/bin  Start-all.sh/stop-all.sh  Start-mapred.sh/stop-mapred.sh  Start-dfs.sh/stop-dfs.sh  Slaves.sh  hadoop

Configuration By default, each Hadoop Core server will load the configuration from several files.  These file will be located in $HADOOP_HOME/conf  Usually identical copies of those files are maintained in every machine in the cluster.

Any question?

A Hadoop Overview. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A.

Similar presentations

Presentation on theme: "A Hadoop Overview. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Hadoop Overview. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A.

Similar presentations

Presentation on theme: "A Hadoop Overview. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A."— Presentation transcript:

Similar presentations

About project

Feedback