Introduction to HDFS: Hadoop Distributed File System

Introduction to HDFS: Hadoop Distributed File System
Dongjing Miao

Outline Background of HDFS Goals of HDFS design
Basic Principle of HDFS Practice at YAHOO! (Year 2010) Outline 1. use it to do what? 2. what features it should have? 3. how it was designed and how it works? 4.Show a practice at Yahoo!

Background of HDFS: Hadoop Project
Hadoop project components HDFS Distributed file system (80% of Yahoo!) MapReduce Distributed computation framework HBase Column-oriented table service Pig Dataflow language and parallel execution framework Hive Data warehouse infrastructure ZooKeeper Distributed coordination service Chukwa System for collecting management data Avro Data serialization system Background of HDFS: Hadoop Project In order to introduce HDFS, we need to know hadoop project. Actually, since Google presented MapReduce in 2004, Hadoop project was found in 2006, there are eight components of it, and Yahoo! Started the first one in the same year, it is HDFS which is the hadoop distributed file system and contributed nearly 80% code to it.

In these years, many enterprises run their applications on it to deal with very large data sets, such as Web Index service of YaHoo!

provide distributed file system + framework for the analysis + transformation of very large data sets ( TB ~ PB usually) using the MapReduce paradigm Background of HDFS: Hadoop Project So what hadoop want to do? Actually, the goals of Hadoop is want to provide provide distributed file system and framework for the analysis + transformation of very large data sets using the MapReduce paradigm. I supposed that you all have known MapReduce, anyway, I’ll give a quick look.

Background of HDFS: MapReduce
Hadoop need to consider the partitioning of data and computation across many hosts, and executing application computations in parallel close to their data. Background of HDFS: MapReduce Basically, in MapReduce, each input file will be splited, and each portion will be sent to several machines for local computation, then send the local results into several other machines to do the reduce operation, after several iterations, results will be output.

Hadoop need to consider the partitioning of data and computation across many hosts, and executing application computations in parallel close to their data. Background of HDFS: MapReduce So to support this paradigm, hadoop need to consider the partitioning of data and computation across many hosts, and executing application computations in parallel close to their data.

Hadoop need to consider the partitioning of data and computation across many hosts, and executing application computations in parallel close to their data. Background of HDFS: MapReduce Note that here, as a part of hadoop, the partitioning, the storage and the shipment of data is what HDFS wants to do. We does not care about the computation in HDFS.

HADOOP: partitioning of data and computation across many hosts executing application computations in parallel close to their data HDFS: store very large data sets reliably stream data sets at high bandwidth to user support general network Goals of HDFS Design For this motivation, Yahoo designed the HDFS in this way, first, store very large data sets reliably; Second, stream data sets at high bandwidth to user and support general network.

Goals of HDFS Design HDFS: So the idea is:
store very large data sets reliably stream data sets at high bandwidth to user support general network So the idea is: cluster with a master node and many data nodes replica of large block (redundant storage for massive amounts of data, while using cheap, unreliable computers) TCP protocol Goals of HDFS Design

Cluster: master & slave nodes + replica of large block
Architecture of HDFS 128MB This is the overview of the HDFS architecture. One cluster consists of one master node (Name Node) and many slave nodes (Data Node). Name Node contains the Metadata, actually can be seen as a sort of index in memory, it mainly records where the replicas of each block of each file is stored and some other meta information. Data Node stores the data based on naïve Linux file system, but the block of HDFS is reorganized as 128 MB. Different blocks from the same file will be stored on different Data Nodes, each data node stores only one replica for each HDFS block, but in the whole cluster, there are always exactly three replicas of each blocks stored in order to guarantee the reliability of storage.

Cluster: master & slave nodes + replica of large block
Architecture of HDFS 128MB These nodes are connected by a fully connected network with TCP protocol, so that every pair of nodes can communicate with each other no matter namenode or datanode or clients. Additionally, a very large cluster may be organized into several racks, rack is a group of datanodes connected by a switch, intuitively, different racks locate at different positions. Anyway, the fully connected topology make sure that, in HDFS, if a client want to read into a file, they can query index in the namenode memory to find the replica address list of file, and download the blocks from the corresponding datanode directly, so we can see that the more replicas the higher bandwidth and also the better reliability. Because the client can download a file from different nodes near it at the same time, so we can have a high bandwidth.

Overview of I/O processing
HDFS implements a model of single-writer multiple-reader. Read Write Overview of I/O processing I’ll briefly show the processing steps of Read and Write, and the motivations of doing like this.

Read Operation Recall the storage of each file in HDFS.

Read Operation When a client application wants to read a file:
– It communicates with the NameNode to determine which blocks make up the file, and which DataNodes those blocks reside on – It then communicates directly with the DataNodes to read the data

Write Operation An HDFS client creates a new file by giving its path to the NameNode. For each block of the file, the NameNode returns a list of DataNodes to host its replicas by considering two factors, one is the distance cost to the client and between the three Datanodes, and the other one is reliability, that is to say that the three datanodes should be in different racks, and under this guarantee, it will reduce the distance cost as much as possible. The client then pipelines data to the chosen DataNodes, which eventually confirm the creation of the block replicas to the NameNode.

Write Operation

We still have an problem
How to deal with Datanode change? Shutdown or failure- damage on relability New node joins into cluster/ Old node restarts We still have an problem

Shutdown / failure How to detect it for Namenode?
Datanodes send heartbeats frequently Namenode kick it out for 1 hour absence ( heartbeats / sec ) How to guarantee #replica? Replicate the affected blocks to other nodes do not have the replica of this block! Shutdown / failure

New node join into cluster / Old node restart
1. Namenode connects with the new Datanode by handshake protocol Check software version and so on Assign an in-cluster id 2. Datanode sent heartbeats so that Namenode will know its latest status such as which replicas stored in it currently and so on 3. Namenode will schedule the remove of the over-replication New node join into cluster / Old node restart

In practice, Namenode send the operation to the Datanodes by replying the heartbeats, such as
Replicate blocks to other nodes Remove local block replicas Add / Shutdown the node Send an immediate block report Replies to Heartbeats

If a node fails, the master will detect that failure and re-assign the work to a different node on the system Restarting a task does not require communication with nodes working on other portions of the data If a failed node restarts, it is automatically added back to the system and assigned new tasks If a node appears to be running slowly, the master can redundantly execute another instance of the same task Goals archieved

Practice in Yahoo! (data collected in year 2010)
Large HDFS clusters at Yahoo! for many years 3500 Datanodes 2 quad core Xeon 2.5ghz / Red Hat Enterprise Linux Server Release 5.1 / Sun Java JDK _13-b03 / 4 directly attached SATA drives ( 4TB ) · 16G RAM / 1 Gbit Ethernet NameNode 64GB RAM 3.3 PB for user applications (9.8 PB in total) #Block = 60 million (#file) * 3 54,000 blocks each Datanode in average Practice in Yahoo! (data collected in year 2010)

Practice in Yahoo!

[1] Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler
[1] Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler. The Hadoop Distributed File System. 2010 [2] Reference

Introduction to HDFS: Hadoop Distributed File System

Similar presentations

Presentation on theme: "Introduction to HDFS: Hadoop Distributed File System"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to HDFS: Hadoop Distributed File System

Similar presentations

Presentation on theme: "Introduction to HDFS: Hadoop Distributed File System"— Presentation transcript:

Similar presentations

About project

Feedback