GARRETT SINGLETARY.

GARRETT SINGLETARY

What is it? An open-source distributed file system designed to store big data reliably and to stream that data at high bandwidths to user applications

History Original implementation by Doug Cutting and Mike Cafarella (UW graduate) was called Nutch Distributed File System Was based on specification outlined in the Google File System paper (2003) Cutting and Cafarella join Yahoo in 2006 and portions of NDFS are pulled to begin development on Apache Hadoop (named after Cutting’s son’s toy elephant)

Architecture Hadoop utilizes a master/slave architecture
Master node (NameNode) manages the filesystem namespace and manages client access to files Slaves (DataNodes) manage storage on the nodes that they run on

Client User applications access the file system via the HDFS client, a code library that exports the HDFS file system interface User references files and directories by paths in the namespace User application generally does not need to know that the file system metadata and storage are on different servers

NameNode Files and directories are represented on the NameNode by inodes, which record attributes like permissions, modification and access times, namespace and disk space quotas. Maintains the namespace tree and mapping of file blocks to DataNodes (physical file location) The Secondary Name Node occasionally connects to the Name Node (by default, ever hour) and grabs a copy of the Name Node’s in-memory metadata and files used to store metadata (both of which may be out of sync). The Secondary Name Node combines this information in a fresh set of files and delivers them back to the Name Node, while keeping a copy for itself. Should the Name Node die, the files retained by the Secondary Name Node can be used to recover the NameNode.

The Name Node holds all the file system metadata for the cluster and oversees the health of Data Nodes and coordinates access to data. The Name Node is the central controller of HDFS. It does not hold any cluster data itself. The Name Node only knows what blocks make up a file and where those blocks are located in the cluster. The Name Node points Clients to the Data Nodes they need to talk to and keeps track of the cluster’s storage capacity, the health of each Data Node, and making sure each block of data is meeting the minimum defined replica policy. Data Nodes send heartbeats to the Name Node every 3 seconds via a TCP handshake, using the same port number defined for the Name Node daemon, usually TCP Every tenth heartbeat is a Block Report, where the Data Node tells the Name Node about all the blocks it has. The block reports allow the Name Node build its metadata and insure (3) copies of the block exist on different nodes, in different racks.

DataNode Hold the block replicas (data)
Each block is represented by two files: one for the actual data and one for the block’s metadata Size of the data file equals the actual length of the block (extra space not required to fill full block size as in traditional file systems

Client Read Client first contacts the NameNode for location of data blocks that hold the file For each block, NameNode returns a list of the DataNodes holding that block Client picks a Data Node from each block list and reads one block at a time with TCP DataNodes can also read from HDFS if needed

https://sites.google.com/site/ruanweizhang/hive-hadoop/arecenttalkonhadoopandhive

Client Write To write, client requests NameNode to nominate three DataNodes to host block replicas, then client writes to a single DataNode that then replicates the block to the other two DataNodes Standard is to put two copies on different DataNodes in a single rack and the third copy on a DataNode in a different rack This policy cuts the inter-rack write traffic which generally improves write performance. The chance of rack failure is far less than that of node failure; this policy does not impact data reliability and availability guarantees. However, it does reduce the aggregate network bandwidth used when reading data since a block is placed in only two unique racks rather than three.

The Client is ready to load File
The Client is ready to load File.txt into the cluster and breaks it up into blocks, starting with Block A. The Client consults the Name Node that it wants to write File.txt, gets permission from the Name Node, and receives a list of (3) Data Nodes for each block, a unique list for each block. The Name Node used its Rack Awareness data to influence the decision of which Data Nodes to provide in these lists. The key rule is that for every block of data, two copies will exist in one rack, another copy in a different rack. So the list provided to the Client will follow this rule. Before the Client writes “Block A” of File.txt to the cluster it wants to know that all Data Nodes which are expected to have a copy of this block are ready to receive it. It picks the first Data Node in the list for Block A (Data Node 1), opens a TCP connection and says, “Hey, get ready to receive a block, and here’s a list of (2) Data Nodes, Data Node 5 and Data Node 6. Go make sure they’re ready to receive this block too.” Data Node 1 then opens a TCP connection to Data Node 5 and says, “Hey, get ready to receive a block, and go make sure Data Node 6 is ready is receive this block too.” Data Node 5 will then ask Data Node 6, “Hey, are you ready to receive a block?” The acknowledgments of readiness come back on the same TCP pipeline, until the initial Data Node 1 sends a “Ready” message back to the Client. At this point the Client is ready to begin writing block data into the cluster.

When all three Nodes have successfully received the block they will send a “Block Received” report to the Name Node. They will also send “Success” messages back up the pipeline and close down the TCP sessions. The Client receives a success message and tells the Name Node the block was successfully written. The Name Node updates it metadata info with the Node locations of Block A in File.txt. The Client is ready to start the pipeline process again for the next block of data.

Replication Management
When a block becomes over-replicated, the NameNode chooses a replica to remove NameNode will prefer not to reduce the number of racks that host replicas Secondly prefers to remove a replica from the DataNode with the least amount of space

Replication Management
When a block becomes under-replicated, it is put in the replication priority queue Highest priority is a block with only 1 replica Lowest priority is a block with a number of replicas that is > ⅔ of its replication factor

MapReduce Map: Simultaneously ask the servers to run a computation on their local block of data. Reduce: All data from completed Map tasks are gathered and further processed to produce desired result

How to Detect Corrupted Data?
In large clusters (thousands of nodes) , failure becomes more common When a client creates a HDFS file, it computes the checksum for each block and sends it to the DataNode along with the data. The DataNode stores the checksum in a metadata file separate from the data When client reads, it computes the checksum again and then compares with the checksum sent by the DataNode If corruption is detected, a different replica of the data is fetched from a different DataNode

DataNode Heartbeats During normal operation DataNodes send heartbeats (every 3 seconds by default) to the NameNode to confirm that the DataNode is operating and the block replicas it hosts are available Every 10th heartbeat is a block report containing the block id, the generation stamp and the length for each block replica the server hosts

DataNode Heartbeats NameNode does not directly call DataNode. It uses replies to heartbeats to send instructions to the DataNodes. Instructions include commands to: Replicate blocks to other nodes Remove block replicas Re-register or shutdown the node Send an immediate block report

Dead DataNodes If NameNode doesn’t receive a heartbeat from a DataNode for 10 minutes, it assumes the node is down and the blocks it contains are unavailable and then schedules creation of new replicas of those blocks on other DataNodes

Disadvantages Security - No encryption at storage and network levels
Java is a highly targeted language Doesn’t handle small data well

Improvements Implement default security standards for outward facing clusters Implement support for SQL

Questions?

GARRETT SINGLETARY.

Similar presentations

Presentation on theme: "GARRETT SINGLETARY."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

GARRETT SINGLETARY.

Similar presentations

Presentation on theme: "GARRETT SINGLETARY."— Presentation transcript:

Similar presentations

About project

Feedback