1 Lei Xu. Brief Introduction  Hadoop  An apache project for data-intensive applications  Typical application: Map-Reduce (OSDI’04), a distributed algorithm.

Slides:



Advertisements
Similar presentations
Fast Crash Recovery in RAMCloud
Advertisements

The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 Presented by Wenhao Xu University of British Columbia.
Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
 Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware  Created by Doug Cutting and.
Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung
G O O G L E F I L E S Y S T E M 陳 仕融 黃 振凱 林 佑恩 Z 1.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google Jaehyun Han 1.
The Google File System Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani 1CS5204 – Operating Systems.
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
The Google File System.
Hadoop File System B. Ramamurthy 4/19/2017.
Case Study - GFS.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Dr. G Sudha Sadhasivam Professor, CSE PSG College of Technology Coimbatore INTRODUCTION TO HADOOP.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Hadoop Distributed File System by Swathi Vangala.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
The Hadoop Distributed File System: Architecture and Design by Dhruba Borthakur Presented by Bryant Yao.
1 The Google File System Reporter: You-Wei Zhang.
The Hadoop Distributed File System
Introduction to Hadoop 趨勢科技研發實驗室. Copyright Trend Micro Inc. Outline Introduction to Hadoop project HDFS (Hadoop Distributed File System) overview.
Cloud Distributed Computing Environment Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed.
Introduction to HDFS Prasanth Kothuri, CERN 2 What’s HDFS HDFS is a distributed file system that is fault tolerant, scalable and extremely easy to expand.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Presenters: Rezan Amiri Sahar Delroshan
GFS : Google File System Ömer Faruk İnce Fatih University - Computer Engineering Cloud Computing
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
HDFS (Hadoop Distributed File System) Taejoong Chung, MMLAB.
Introduction to HDFS Prasanth Kothuri, CERN 2 What’s HDFS HDFS is a distributed file system that is fault tolerant, scalable and extremely easy to expand.
Presented by: Katie Woods and Jordan Howell. * Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
Presenter: Seikwon KAIST The Google File System 【 Ghemawat, Gobioff, Leung 】
Experiments in Utility Computing: Hadoop and Condor Sameer Paranjpye Y! Web Search.
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
Cloud Distributed Computing Environment Hadoop. Hadoop is an open-source software system that provides a distributed computing environment on cloud (data.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Distributed File Systems Sun Network File Systems Andrew Fıle System CODA File System Plan 9 xFS SFS Hadoop.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
BIG DATA/ Hadoop Interview Questions.
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
Introduction to Distributed Platforms
Slides modified from presentation by B. Ramamurthy
Introduction to HDFS: Hadoop Distributed File System
Gregory Kesden, CSE-291 (Storage Systems) Fall 2017
Gregory Kesden, CSE-291 (Cloud Computing) Fall 2016
Software Engineering Introduction to Apache Hadoop Map Reduce
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
GARRETT SINGLETARY.
The Google File System (GFS)
Hadoop Technopoints.
Introduction to Apache
The Google File System (GFS)
The Google File System (GFS)
The Google File System (GFS)
The Google File System (GFS)
THE GOOGLE FILE SYSTEM.
by Mikael Bjerga & Arne Lange
The Google File System (GFS)
Presentation transcript:

1 Lei Xu

Brief Introduction  Hadoop  An apache project for data-intensive applications  Typical application: Map-Reduce (OSDI’04), a distributed algorithm for massive-data computation  Crawl and index web pages (Y!)  Analyze popular topics and trends (Twitter)  Led by Yahoo!/Facebook/Cloudera 2

Brief Introduction (cont’d)  Hadoop Distributed File System (HDFS)  A scalable distributed file system to serve Hadoop MapReduce applications  Borrow the essential ideas from the Google File System  Sanjay Ghenawat, Howard Gobioff and Shun-Tak Leung. The Google File System. 19 TH ACM Symposium on Operating System Principles (SOSP’03)  Share same design assumptions 3

Google File System  A scalable distributed file system designed for:  Data-intensive applications (mainly MapReduce)  Web page indexing  Then it has spread to other applications  E.g. Gmail, Big Table, App Engine  Fault-tolerant  Low-cost hardware  High throughputs 4

Google File System (cont’d)  Departure from other file system assumptions  Run on top of the commodity hardware  Component failures are common  Files are huge  Basic block size 64~128 MB  1~64KB in traditional file systems (Ext3/NTFS and etc.)  Massive-data/data-intensive processing  Large streaming read and small random read  Large, sequential writes  No (or bare) random writes 5

Hadoop DFS Assumptions  Other than the assumptions in Google File System, HDFS assumes that:  Simple Coherency Model  Write-once-read-many  Once a file was created, written and closed, it can not be changed anymore.  Moving Computation Is Cheaper than Moving Data  “Semi-Location-Aware” computation  Try its best to assign computations closer to the related data  Portability Across Heterogeneous Hardware and Software Platforms  Is written in Java, multi-platform support  Google File System was written in C++ and run on Linux  Store data on top of existing file systems (NTFS/Ext4/Btrfs…) 6

HDFS Architecture  Master/Slave Architecture  NameNode  Metadata Server  File location ( file name -> the DataNode )  File attributions (atime/ctime/mtime, size, the number of replicas and etc.)  DataNode  Manages the storage attached to the nodes that they run on  Client  Producer and Consumers of data 7

HDFS Architecture (cont’d) 8

NameNode  Metadata Server  Only one NameNode in one cluster  Single Point Failure  Potential performance bottleneck  Manage the file system namespace  Traditional hierarchical namespace  Keep all file metadata in memory for fast access  The memory size of NameNode determines how many files can be supported  Execute file system namespace operation:  Open/close/rename/create/unlink…  Return the location of data blocks 9

NameNode (cont’d)  Maintains system-wide activities  E.g. creating new replications of file data, garbage collection, load balancing and etc.  Periodically communicates with DataNode to collect their statuses  Is DataNode alive?  Is DataNode overload? 10

DataNode  Storage server  Store fixed-size data blocks on local file systems ( ext4/zfs/btrfs )  Serve read/write operations from the clients  Create, delete, replicate data blocks upon instruction from the NameNode  Block size = 64MB 11

Client  Application-level implementations  Does not provide POSIX API  Hadoop has a FUSE interface  FUSE: Filesystem in Userspace  Has limited functions (e.g, no random write supports)  Query the NameNode for file locations and metadata  Contact corresponding DataNodes for file I/Os 12

Data Replication  Files are stored as a sequence of blocks  The blocks (typically 64MB) are replicated for fault tolerance  Replication factor is configurable per file  Can be specified at creation time, and can be changed later  The NameNode decides how to replicate blocks. It periodically receives:  Heartbeat, which implies the DataNode is alive  Blockreport, which contains a list of all blocks on a DataNode  When a DataNode is down, the NameNode replicas all blocks on this DataNode to other active DataNode to achieve enough replications 13

Data Replication (cont’d) 14

Data Replication (cont’d)  Rack Awareness  Hadoop instance runs on a cluster of computers that spread across many racks:  Nodes in same rack are connected by one switches  Communications between two nodes in different racks go through switches  Slower than nodes in same rack  One rack may fail due to network/power issues.  Improve data reliability, availability and network bandwidth utilization 15

Data Replications (cont’d)  Rack Awareness (cont’d)  For common case, the replication factor is three  Two replicas are placed on two different nodes in same rack  The third replica is placed on a node in a remote rack  Improves write performance  2/3 writes are in same rack, faster  Without compromising data reliability 16

Replica Selection  For READ operation:  Minimize the bandwidth consumption and latency  Prefer nearer node:  If there is a replica on the same node, it is preferred  The cluster may span multiple data centers, replicas in same data centers are preferred 17

Filesystem Metadata  The HDFS stores all file metadata on NameNode  An EditLog  Record every change that occurs to filesystem metadata  For failure recovery  Same as journaling file systems (Ext3/NTFS)  An FSImage  Stores mapping of blocks to files and file attributes  EditLog and FSImage are stored on NameNode locally 18

Filesystem Metedata(cont’d)  DataNode has no knowledge about HDFS files  It only stores data blocks as regular files on local file systems  With a checksum for data integrity  It periodically reports a Blockreport that includes all blocks stored on this DataNode to NameNode  Only the DataNode has knowledge about the availability of one block replica. 19

Filesystem Metadata(cont’d)  When NameNode starts up  Load FSImage and EditLog from the local file system  Update FSImage with latest EditLogs  Create a new FSImage for latest checkpoint and store on local file system permanently 20

Communication Protocol  A Hadoop specific RPC on top of TCP/IP  NameNode is simply a server that only responses to the requests issued by DataNodes or clients  ClientProtocol.java – client protocol  DatanodeProtoco.java – datanode protocol 21

Robustness  Primary object of HDFS:  Reliable with component failures  In a typical large cluster (>1K nodes), component failures are common  Three common types of failures:  NameNode failures  DataNode failures  Network failures 22

Robustness (cont’d)  Heartbeats  Each DataNode sends heartbeats to NameNode periodically  System status and block reports  The NameNode marks DataNodes w/o recent heartbeats as dead  Does not forward I/O to it  Mark all data blocks on these DataNodes as unavailable  Re-replicate these blocks if necessary (according to the replication factor).  Can detect network failures and DataNode dies 23

Robustness (cont’d)  Re-Balancing  Automatically move the data on one DataNode to another one  If the free space falls below a threshold  Data-Integrity  A block of data may be corrupted  Disk faults, network faults, buggy software  Client computes checksums for each block and stores them in a separate hidden file in HDFS namespace  Verify data before read it 24

Robustness (cont’d)  Metadata failures  FSImage and EditLog are the central data structures  Once corrupted, HDFS can not build namespace and access data  NameNode can be configured to support multiple- copies of FSImage and EditLog  E.g: one FSImage/EditLog on local machine, another one is stored on mounted remote NFS server.  Reduce the update performances  Once NameNode is down, it must to restart the cluster manually 25

Data Organization  Data Blocks  HDFS is designed to support very large files and streaming I/Os  A File is chopped up into 64MB blocks  Reduce the number of connection establishments and accelerate TCP transmissions  If possible, each block of a file will reside on a different DataNode  For future parallel I/O and computations (MapReduce) 26

Data Organization (cont’d)  Staging  When write a new file  A client firstly caches the file data into temporary local file until this file worth over the HDFS block size  Then the client contacts NameNode to assign a DataNode  The client flushes the cached data to the chosen DataNode  Fully utilized the bandwidth 27

Data Organization (cont’d)  Replication Pipeline  A client obtains a DataNode list to flush one block  The client firstly flushes the data to the first DataNode  The first DataNode starts to receive the data in small portions (4kB), writes that portions to local storage, and transfer it to the next DataNode in the list immediately  The second DataNode acts as the first one  The total transfer time for one block(64MB) is:  T(64MB) + T(4kb) * 2, for pipeline  3 * T(64MB), for non-pipeline 28

Replication Pipeline  The client asks the NameNode where to put data  The client push data to DataNode linearly to fully utilize network bandwidth  The secondary replicas reply to the primary. Then the primary replies to the client for success. 29 * This figure was in “The Google File System” paper

See also  HBase – a BigTable implementation on Hadoop  Key-value storage  Pig – high-level language to run data analyze on Hadoop  ZooKeeper  “ZooKeeper: Wait-free Coordination for Internet-scale Systems”, ATC’10, Best Paper  CloudStore (KFS, previously Kosmosfs)  A C++ implementation of Google File System  Parallels the Hadoop project 30

Google v.s Y!/Facebook/Amazon.. Google Google File System MapReduce BigTable Hadoop Hadoop DFS Hadoop MapReduce HBase 31

Known Issues and Research Interests  NameNode is the single point failure  Limits the total files supported in the HDFS as well  RAM limitation  Google has changed the one-master architecture to multiple-header cluster  However, the details are unrevealed 32

Known Issues and Research Interests (cont’d)  Use replications to provide data reliability  Same problems to RAID-1 ?  Apply RAID technologies to HDFS?  “DiskReduce: RAID for Data-Intensive Scalable Computing”, PDSW’09 33

Known Issues and Research Interests (cont’d)  Energy Efficiency  DataNodes are alive for data availability  However, there may be no MapReduce computations running on them.  Waste of energy 34

Conclusion  Hadoop Distributed File System is designed to serve MapReduce computations  Provide high reliable storage  Support mass of data  Optimized data placement policies based on the topology of data centers  Large companies build their core businesses on top of these infrastructures  Google: GFS/MapReduce/BigTable  Yahoo!/Facebook/Amazon/Twitter/NY Times: Hadoop/HBase/Pig 35

Reference  HDFS Architecture Guide: hdfs_design.html hdfs_design.html 36

Thank you ! 37 Questions?