Progress Hadoop buildup has been completed. Version 0.19.0, running under Standalone mode. HBase buildup has been completed. Version 0.19.3, with no assists of HDFS. Simple demonstration over MapReduce. Simple word count program.
Hadoop Full name Apache Hadoop project. Open source implementation for reliable, scalable distributed computing. An aggregation of the following projects (and its core): Avro Chukwa HBase HDFS Hive MapReduce Pig ZooKeeper
Virtual Machine (VM) Virtualization All services are delivered through VMs. Allows for dynamically configuring and managing. There can be multiple VMs running on a single commodity machine. VMware
HDFS(Hadoop Distributed File System) The highly scalable distributed file system of Hadoop. Resembles Google File System(GFS). Provides reliability by replication. NameNode & DataNode NameNode Maintains file system metadata and namespace. Provides management and control services. Usually one instance. DataNode Provides data storage and retrieval services. Usually several instances.
MapReduce The sophisticate distributed computing service of Hadoop. A computation framework. Usually resides on HDFS. JobTracker & TaskTracker JobTracker Manages the distribution of tasks to the TaskTrackers. Provides job monitoring and control, and the submission of jobs. TaskTracker Manages single map or reduce tasks on a compute node.
Cluster Makeup A Hadoop cluster is usually make up by: Real Machines. Not required to be homogeneous. Homogeneity will help maintainability. Server Process. Multiple process can be run on a single VM. Master & Slave The node/machine running the JobTracker or NameNode will be Master node. The ones running the TaskTracker or DataNode will be Slave node.
Administrator Scripts Administrator can use the following script files to start or stop server processes. Can be located in $HADOOP_HOME/bin Start-all.sh/stop-all.sh Start-mapred.sh/stop-mapred.sh Start-dfs.sh/stop-dfs.sh Slaves.sh hadoop
Configuration By default, each Hadoop Core server will load the configuration from several files. These file will be located in $HADOOP_HOME/conf Usually identical copies of those files are maintained in every machine in the cluster.