Presentation is loading. Please wait.

Presentation is loading. Please wait.

HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.

Similar presentations


Presentation on theme: "HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13."— Presentation transcript:

1 HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13

2 Topics Introduction HDFS Overview Basics Architecture Data reliability Block replicas NameNode reliability NameNode failure Journal Checkpoint Conclusion

3 INTRODUCTION

4 Introduction HDFS is a Cloud Based File Systems which allows storage of large data sets on clusters of commodity hardware Huge number of components, each component has a non-trivial probability of failure Hardware failure is the norm rather than the exception The purpose of this presentation is to present the techniques used in HDFS to keep the system and Data fully reliable.

5 HDFS OVERVIEW

6 HDFS Basics An open-source implementation of distributed file system based on Google File System Designed to store very large data sets reliably across large clusters of computers Optimized for MapReduce application Large files, some several GB large Reads are performed in a large streaming fashion Large throughput instead of low latency

7 HDFS Architecture Namenode B replication Rack1 Rack2 Client Blocks Datanodes Client Write Read Metadata ops Metadata(Name, replicas..) (/home/foo/data,6... Block ops

8 HDFS NameSpace Node The HDFS Namespcae Node keeps the metadata for each data block in the system Implemented as a single master server for a cluster To achieve high performance, the entire namespace kept in RAM Manage the replication logic for the DataNodes Serves clients with file block location for reads metadata includes: Files and directories hierarchy Permissions, modification time, etc Mapping of file blocks to DataNodes

9 HDFS DataNode A cluster can contain thousands of DataNodes DataNode is where the actual File block is kept User data divided into blocks and replicated across DataNodes A DataNode identifies block replicas in its possession to the NameNode by sending a block report DataNodes serves read, write requests, performs block creation, deletion, and replication upon instruction from NameNode

10 DATA RELIABILITY Block replicas

11 NameNode & Data Replication All data-replication information is stored and managed by the NameNode The NameNode makes all decisions regarding replication of blocks It periodically receives a Heartbeat and a Blockreport from each of the DataNodes in the cluster A Blockreport contains a list of all blocks on a DataNode Receipt of a Heartbeat implies that the DataNode is functioning properly Datanodes without recent heartbeat marked as dead

12 Re-replication The NameNode constantly tracks which blocks need to be replicated and initiates replication whenever necessary The necessity for re-replication may arise due to many reasons a DataNode may become unavailable the replication factor of a file may be increased a replica may become corrupted a hard disk on a DataNode may fail Re-replication is fast because it is a parallel problem that scales with the size of the cluster Lower the propability of block loss while replication is carried out

13 Replica placement To protected against rack failure (as in power shortage), the name node can manage replicas to be stored in different racks Beside the data reliability, this can also improve network bandwidth and client’s latency Common case (replication factor == 3): Put one replica on one node in the local rack Another on a different node in the local rack The last on a different node in a different rack Doesn’t compromise data reliability and availability

14 NAMENODE RELIABILITY

15 NameNode Failure NameNode is a Single Point of Failure for HDFS cluster If it becomes unavailable for clients, the whole cluster is unusable Corruption / loss of metadata – Data blocks becomes unavailable NameNode keeps data on RAM – full data loss in case of power shortage Needs a persistent solution

16 NameNode Persistence The persistent record of the image stored in the local host’s native files system is called a checkpoint The NameNode also stores the modification log of the image called the journal in the local host’s native file system For improved durability, redundant copies of the checkpoint and journal can be made at other servers

17 Journal The journal is persistently record every change that occurs to file system metadata (not including block mapping) Implemented as a write-ahead commit log for changes to the file system that must be persistent To avoid being a bottleneck, few transactions are batched and committed together

18 Checkpoint Checkpoint is a persistent record of the NameNode’s state written to disk The checkpoint file is never changed by the NameNode Either a new checkpoint is created or a namespace is loaded from a previous checkpoint by the namenode When the NameNode starts, it performs the checkpoint process: reads the current checkpoint and Journal from the disk applies all the transactions from the Journal to the in-memory representation of the namespace flushes out this new version into a new checkpoint on disk truncate the old Journal

19 Creating a Checkpoint New checkpoint file can be created at startup only or periodically Creating a checkpoint emptying the journal: Long journal increase the probability of loss or corruption of the journal file Very large journal extends the time required to restart the NameNode To create periodic checkpoint, a dedicated server required (Checkpoint Node) since it has the same memory requirements as the NameNode

20 CONCLUSION

21 Conclusion HDFS has good reliability model, which can handle the expected hardware failure While few techniques are in use to achieve namespace fault tolerance, it is still single point of failure in the system Many reliability parameters are configurable and can be changed to fit system demands Replicas count Rack scattering policy Checkpoint and Journal redundancy


Download ppt "HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13."

Similar presentations


Ads by Google