GARRETT SINGLETARY.

Slides:



Advertisements
Similar presentations
Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
Advertisements

MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung
G O O G L E F I L E S Y S T E M 陳 仕融 黃 振凱 林 佑恩 Z 1.
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
The Google File System.
Case Study - GFS.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Hadoop Distributed File System by Swathi Vangala.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
The Hadoop Distributed File System: Architecture and Design by Dhruba Borthakur Presented by Bryant Yao.
1 The Google File System Reporter: You-Wei Zhang.
The Hadoop Distributed File System
Cloud Distributed Computing Environment Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
HDFS Hadoop Distributed File System
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
Hadoop & Condor Dhruba Borthakur Project Lead, Hadoop Distributed File System Presented at the The Israeli Association of Grid Technologies.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Presenters: Rezan Amiri Sahar Delroshan
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
HDFS (Hadoop Distributed File System) Taejoong Chung, MMLAB.
Introduction to HDFS Prasanth Kothuri, CERN 2 What’s HDFS HDFS is a distributed file system that is fault tolerant, scalable and extremely easy to expand.
Presented by: Katie Woods and Jordan Howell. * Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture Chunkservers Master Consistency Model File Mutation Garbage.
Cloud Distributed Computing Environment Hadoop. Hadoop is an open-source software system that provides a distributed computing environment on cloud (data.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
1 Student Date Time Wei Li Nov 30, 2015 Monday 9:00-9:25am Shubbhi Taneja Nov 30, 2015 Monday9:25-9:50am Rodrigo Sanandan Dec 2, 2015 Wednesday9:00-9:25am.
BIG DATA/ Hadoop Interview Questions.
Presenter: Yue Zhu, Linghan Zhang A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint.
REPLICATION & LOAD BALANCING
Hadoop.
Introduction to Distributed Platforms
Unit 2 Hadoop and big data
Slides modified from presentation by B. Ramamurthy
CSS534: Parallel Programming in Grid and Cloud
HDFS Yarn Architecture
Chapter 10 Data Analytics for IoT
Large-scale file systems and Map-Reduce
Introduction to HDFS: Hadoop Distributed File System
Google File System CSE 454 From paper by Ghemawat, Gobioff & Leung.
Gregory Kesden, CSE-291 (Storage Systems) Fall 2017
Gregory Kesden, CSE-291 (Cloud Computing) Fall 2016
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Replication Middleware for Cloud Based Storage Service
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
Ministry of Higher Education
The Basics of Apache Hadoop
HDFS on Kubernetes -- Lessons Learned
Hadoop Distributed Filesystem
Hadoop Basics.
آزمايشگاه سيستمهای هوشمند علی کمالی زمستان 95
Hadoop Technopoints.
Introduction to Apache
HDFS on Kubernetes -- Lessons Learned
Lecture 16 (Intro to MapReduce and Hadoop)
Chapter 2: Operating-System Structures
by Mikael Bjerga & Arne Lange
Chapter 2: Operating-System Structures
Presentation transcript:

GARRETT SINGLETARY

What is it? An open-source distributed file system designed to store big data reliably and to stream that data at high bandwidths to user applications

https://www.edureka.co/blog/wp-content/uploads/2013/03/companies-using-Hadoop1.jpg

History Original implementation by Doug Cutting and Mike Cafarella (UW graduate) was called Nutch Distributed File System Was based on specification outlined in the Google File System paper (2003) Cutting and Cafarella join Yahoo in 2006 and portions of NDFS are pulled to begin development on Apache Hadoop (named after Cutting’s son’s toy elephant)

Architecture Hadoop utilizes a master/slave architecture Master node (NameNode) manages the filesystem namespace and manages client access to files Slaves (DataNodes) manage storage on the nodes that they run on

http://1.bp.blogspot.com/-vsbSMA-QVnU/VCqyJlqeNWI/AAAAAAAABMk/RRpysWsrErQ/s1600/arch.png

Client User applications access the file system via the HDFS client, a code library that exports the HDFS file system interface User references files and directories by paths in the namespace User application generally does not need to know that the file system metadata and storage are on different servers

NameNode Files and directories are represented on the NameNode by inodes, which record attributes like permissions, modification and access times, namespace and disk space quotas. Maintains the namespace tree and mapping of file blocks to DataNodes (physical file location) The Secondary Name Node occasionally connects to the Name Node (by default, ever hour) and grabs a copy of the Name Node’s in-memory metadata and files used to store metadata (both of which may be out of sync). The Secondary Name Node combines this information in a fresh set of files and delivers them back to the Name Node, while keeping a copy for itself. Should the Name Node die, the files retained by the Secondary Name Node can be used to recover the NameNode.

The Name Node holds all the file system metadata for the cluster and oversees the health of Data Nodes and coordinates access to data. The Name Node is the central controller of HDFS. It does not hold any cluster data itself. The Name Node only knows what blocks make up a file and where those blocks are located in the cluster. The Name Node points Clients to the Data Nodes they need to talk to and keeps track of the cluster’s storage capacity, the health of each Data Node, and making sure each block of data is meeting the minimum defined replica policy. Data Nodes send heartbeats to the Name Node every 3 seconds via a TCP handshake, using the same port number defined for the Name Node daemon, usually TCP 9000. Every tenth heartbeat is a Block Report, where the Data Node tells the Name Node about all the blocks it has. The block reports allow the Name Node build its metadata and insure (3) copies of the block exist on different nodes, in different racks. https://sites.google.com/site/ruanweizhang/hive-hadoop/arecenttalkonhadoopandhive

DataNode Hold the block replicas (data) Each block is represented by two files: one for the actual data and one for the block’s metadata Size of the data file equals the actual length of the block (extra space not required to fill full block size as in traditional file systems

Client Read Client first contacts the NameNode for location of data blocks that hold the file For each block, NameNode returns a list of the DataNodes holding that block Client picks a Data Node from each block list and reads one block at a time with TCP DataNodes can also read from HDFS if needed

https://sites.google.com/site/ruanweizhang/hive-hadoop/arecenttalkonhadoopandhive

Client Write To write, client requests NameNode to nominate three DataNodes to host block replicas, then client writes to a single DataNode that then replicates the block to the other two DataNodes Standard is to put two copies on different DataNodes in a single rack and the third copy on a DataNode in a different rack This policy cuts the inter-rack write traffic which generally improves write performance. The chance of rack failure is far less than that of node failure; this policy does not impact data reliability and availability guarantees. However, it does reduce the aggregate network bandwidth used when reading data since a block is placed in only two unique racks rather than three.

The Client is ready to load File The Client is ready to load File.txt into the cluster and breaks it up into blocks, starting with Block A. The Client consults the Name Node that it wants to write File.txt, gets permission from the Name Node, and receives a list of (3) Data Nodes for each block, a unique list for each block. The Name Node used its Rack Awareness data to influence the decision of which Data Nodes to provide in these lists. The key rule is that for every block of data, two copies will exist in one rack, another copy in a different rack. So the list provided to the Client will follow this rule. Before the Client writes “Block A” of File.txt to the cluster it wants to know that all Data Nodes which are expected to have a copy of this block are ready to receive it. It picks the first Data Node in the list for Block A (Data Node 1), opens a TCP 50010 connection and says, “Hey, get ready to receive a block, and here’s a list of (2) Data Nodes, Data Node 5 and Data Node 6. Go make sure they’re ready to receive this block too.” Data Node 1 then opens a TCP connection to Data Node 5 and says, “Hey, get ready to receive a block, and go make sure Data Node 6 is ready is receive this block too.” Data Node 5 will then ask Data Node 6, “Hey, are you ready to receive a block?” The acknowledgments of readiness come back on the same TCP pipeline, until the initial Data Node 1 sends a “Ready” message back to the Client. At this point the Client is ready to begin writing block data into the cluster. https://sites.google.com/site/ruanweizhang/hive-hadoop/arecenttalkonhadoopandhive

When all three Nodes have successfully received the block they will send a “Block Received” report to the Name Node. They will also send “Success” messages back up the pipeline and close down the TCP sessions. The Client receives a success message and tells the Name Node the block was successfully written. The Name Node updates it metadata info with the Node locations of Block A in File.txt. The Client is ready to start the pipeline process again for the next block of data. https://sites.google.com/site/ruanweizhang/hive-hadoop/arecenttalkonhadoopandhive

Replication Management When a block becomes over-replicated, the NameNode chooses a replica to remove NameNode will prefer not to reduce the number of racks that host replicas Secondly prefers to remove a replica from the DataNode with the least amount of space

Replication Management When a block becomes under-replicated, it is put in the replication priority queue Highest priority is a block with only 1 replica Lowest priority is a block with a number of replicas that is > ⅔ of its replication factor

MapReduce Map: Simultaneously ask the servers to run a computation on their local block of data. Reduce: All data from completed Map tasks are gathered and further processed to produce desired result

How to Detect Corrupted Data? In large clusters (thousands of nodes) , failure becomes more common When a client creates a HDFS file, it computes the checksum for each block and sends it to the DataNode along with the data. The DataNode stores the checksum in a metadata file separate from the data When client reads, it computes the checksum again and then compares with the checksum sent by the DataNode If corruption is detected, a different replica of the data is fetched from a different DataNode

DataNode Heartbeats During normal operation DataNodes send heartbeats (every 3 seconds by default) to the NameNode to confirm that the DataNode is operating and the block replicas it hosts are available Every 10th heartbeat is a block report containing the block id, the generation stamp and the length for each block replica the server hosts

DataNode Heartbeats NameNode does not directly call DataNode. It uses replies to heartbeats to send instructions to the DataNodes. Instructions include commands to: Replicate blocks to other nodes Remove block replicas Re-register or shutdown the node Send an immediate block report

Dead DataNodes If NameNode doesn’t receive a heartbeat from a DataNode for 10 minutes, it assumes the node is down and the blocks it contains are unavailable and then schedules creation of new replicas of those blocks on other DataNodes

Disadvantages Security - No encryption at storage and network levels Java is a highly targeted language Doesn’t handle small data well

Improvements Implement default security standards for outward facing clusters Implement support for SQL

Questions?