2014. 3. 24 Presenter: Seikwon KAIST The Google File System 【 Ghemawat, Gobioff, Leung 】

Slides:



Advertisements
Similar presentations
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 Presented by Wenhao Xu University of British Columbia.
Advertisements

Question Scalability vs Elasticity What is the difference?
Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung
The google file system Cs 595 Lecture 9.
THE GOOGLE FILE SYSTEM CS 595 LECTURE 8 3/2/2015.
G O O G L E F I L E S Y S T E M 陳 仕融 黃 振凱 林 佑恩 Z 1.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google Jaehyun Han 1.
The Google File System Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani 1CS5204 – Operating Systems.
GFS: The Google File System Brad Karp UCL Computer Science CS Z03 / th October, 2006.
NFS, AFS, GFS Yunji Zhong. Distributed File Systems Support access to files on remote servers Must support concurrency – Make varying guarantees about.
The Google File System (GFS). Introduction Special Assumptions Consistency Model System Design System Interactions Fault Tolerance (Results)
Google File System 1Arun Sundaram – Operating Systems.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
The Google File System and Map Reduce. The Team Pat Crane Tyler Flaherty Paul Gibler Aaron Holroyd Katy Levinson Rob Martin Pat McAnneny Konstantin Naryshkin.
1 The File System Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung (Google)
GFS: The Google File System Michael Siegenthaler Cornell Computer Science CS th March 2009.
Large Scale Sharing GFS and PAST Mahesh Balakrishnan.
The Google File System.
Google File System.
Northwestern University 2007 Winter – EECS 443 Advanced Operating Systems The Google File System S. Ghemawat, H. Gobioff and S-T. Leung, The Google File.
Case Study - GFS.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
1 The Google File System Reporter: You-Wei Zhang.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung
The Google File System Ghemawat, Gobioff, Leung via Kris Molendyke CSE498 WWW Search Engines LeHigh University.
Homework 1 Installing the open source cloud Eucalyptus Groups Will need two machines – machine to help with installation and machine on which to install.
The Google File System Presenter: Gladon Almeida Authors: Sanjay Ghemawat Howard Gobioff Shun-Tak Leung Year: OCT’2003 Google File System14/9/2013.
The Google File System Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
CENG334 Introduction to Operating Systems Erol Sahin Dept of Computer Eng. Middle East Technical University Ankara, TURKEY Network File System Except as.
Presenters: Rezan Amiri Sahar Delroshan
GFS : Google File System Ömer Faruk İnce Fatih University - Computer Engineering Cloud Computing
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
EE324 DISTRIBUTED SYSTEMS FALL 2015 Google File System.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture Chunkservers Master Consistency Model File Mutation Garbage.
Google File System Robert Nishihara. What is GFS? Distributed filesystem for large-scale distributed applications.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Lecture 24: GFS.
Google File System Sanjay Ghemwat, Howard Gobioff, Shun-Tak Leung Vijay Reddy Mara Radhika Malladi.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M th October, 2008.
Dr. Zahoor Tanoli COMSATS Attock 1.  Motivation  Assumptions  Architecture  Implementation  Current Status  Measurements  Benefits/Limitations.
1 CMPT 431© A. Fedorova Google File System A real massive distributed file system Hundreds of servers and clients –The largest cluster has >1000 storage.
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
Cloud Computing Platform as a Service The Google Filesystem
Google File System.
GFS.
The Google File System (GFS)
Google Filesystem Some slides taken from Alan Sussman.
Google File System CSE 454 From paper by Ghemawat, Gobioff & Leung.
Gregory Kesden, CSE-291 (Storage Systems) Fall 2017
Gregory Kesden, CSE-291 (Cloud Computing) Fall 2016
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
The Google File System (GFS)
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Google Vijay Kumar
The Google File System (GFS)
The Google File System (GFS)
The Google File System (GFS)
The Google File System (GFS)
Cloud Computing Storage Systems
THE GOOGLE FILE SYSTEM.
by Mikael Bjerga & Arne Lange
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google SOSP’03, October 19–22, 2003, New York, USA Hyeon-Gyu Lee, and Yeong-Jae.
The Google File System (GFS)
Presentation transcript:

Presenter: Seikwon KAIST The Google File System 【 Ghemawat, Gobioff, Leung 】

Contents Design Overview Ⅰ Ⅰ System Interactions Ⅱ Ⅱ Introduction Fault Tolerance IV Master Operation Ⅲ Ⅲ Conclusion V V

2014 Internet System Technology 3/27 【 Introduction 】  What is GFS? - Distributed file system - Goal: performance, scalability, reliability, availability  Why GFS? - To meet Google’s data processing needs - Different in design assumptions Component failures are the norm Files are huge No data overwriting Co-designing the client and the file system

Design Overview Ⅰ Ⅰ 1.1 Assumptions 1.2 Interface 1.3 Architecture 1.3 Chunk 1.4 Metadata 1.5 Consistency Model

2014 Internet System Technology 5/ Assumptions Design Overview  Cheap Component System  Optimization for General Sized Files  Workloads - Large stream reads - Small random reads - Large sequential appends  High bandwidth > Low latency Basic Assumptions

2014 Internet System Technology 6/ Interface  create  delete  open  close  read  write  snapshot : Copy file or directory  record append : Multiple clients to append data to the same file concurrently  POSIX-like APIs but not POSIX APIs  Files are organized hierarchically in directories. Features Design Overview

2014 Internet System Technology 7/ Architecture  Single Master, Multiple Chunk Servers, Multiple Clients  Files are divided into 64MB chunks  Chunks are replicated in multiple chunk servers : Default 3  Master communicates with chunk servers with Heart Beat msg.  Master maintains all file system metadata  No file cache, but Yes metadata cache Architecture Overview Design Overview

2014 Internet System Technology 8/ Chunk  Unit of data stored in GFS  Large chunk size is key design parameter : 64MB  Chunk replica is stored as a plain Linux file on a chunk server Design Overview  Minimize interaction between client and master  Reduce network overhead  Reduce metadata size stored in master. Pros of Large Chunk Size  Chunk server become hot spot on one chunk Cons of Large Chunk Size

2014 Internet System Technology 9/ Metadata  Metadata Types:  File and chunk namespaces : Persistent  Mapping from files to chunks : Persistent  Locations of chunk replicas : Not Persistent  All metadata is in memory. Design Overview  Fast Master Operations  Efficient to periodically scan through entire state background  Chunk garbage collection  Re-replication  Chunk migration  Low cost of adding Extra Memory Why Stored in Memory?

2014 Internet System Technology 10/ Metadata Cont. Design Overview  Chunk location is not persistent  Polls from chunk server at Master start up  Keep up-to-date by Heart Beat Message Chunk locations  Historical records of critical metadata changes  Persistent record of metadata changes  Logical timeline of concurrent operations  If an error occurs in master, it recovers by replaying the operation logs  Checkpoints to minimize the operation log Operation Logs

2014 Internet System Technology 11/ Consistency Model Design Overview  Relaxed Consistency Model  Guarantees the Atomic File-namespace Mutation  Levels of Consistency on Data  Inconsistent : Different client see different data  Consistent : All clients see same data, regardless of replica  Defined : Client sees complete written data  Append rather then overwrite  Self Validation What Application has to do

System Interactions Ⅱ Ⅱ 2.1 Write Control 2.2 Data Flow 2.3 Atomic Record Append 2.4 Snapshot

2014 Internet System Technology 13/27 Write Process 2.1 Write Control System Interactions Lease Primary Chunk Replica that is Granted by Master Mutation Operation that Changes the Content or Metadata What Application has to do

2014 Internet System Technology 14/ Data Flow System Interactions  Data is pushed linearly along a chain of chunk servers  Forwards the data to the closest machine  Distance - estimated from IP addresses  Line topology  Full outbound bandwidth  Pipelining: to minimize latency and maximize throughput  Elapsed time for transferring  Elapsed time = B/T + RL B : bytes for transfer T : network throughput R : # of replicas L : latency Network Construction in GFS

2014 Internet System Technology 15/ Atomic Record Appends System Interactions  In Traditional Writes  Clients specifies offset where the data to be written Data fragmentation  In Record Append  Client specifies only the data  Similar to write in GFS  Much like write in GFS  GFS appends data to the file at least once atomically  The chunk is padded - when (record > maximum size)  Retry append when error occurs Record Append Process

2014 Internet System Technology 16/ Snapshot System Interactions  Instant File/Directory Copy  Master receive snapshot request  Revokes leases on chunks  Master logs the operation  Duplicate the metadata for file/directory  New snapshot  Duplicate local chunk when write operation comes Snapshot Process

2014 Internet System Technology 17/27 Master Operation Ⅲ Ⅲ 3.1 Namespace Management 3.2 Replica Placement 3.3 Creation, Re-replication, Rebalancing 3.4 Garbage Collection 3.5 Stale Relica Detection

2014 Internet System Technology 18/ Namespace Management Master Operation  Allows Multiple Operations at Same Time  Master Operations Need Lock  Prefix Compression  Snapshot of /home/user read-lock on /home read-lock on /home/user write-lock on /home/user  File creation for /home/user/foo read-lock on /home read-lock on /home/user write-lock on /home/user/foo  Locks conflicts  Serialize operations Locking Example

2014 Internet System Technology 19/ Replica Placement Master Operation  Purpose of Replica Placement  Maximize data reliability and availability  Maximize bandwidth utilization  Spread Chunks across Rack  Available even when power circuit problem occurs Rack

2014 Internet System Technology 20/ Creation, Re-replication, Rebalancing Master Operation  Movements for Chunk Replicas  Chunk creation  Re-replication  Load balancing  Creation  Place chunk at below-average-disk-space chunk server  Spread replicas across racks  Re-replication  Re-replicates a chunk when criteria falls below specified level  Rebalancing  Periodically examines for load balancing.

2014 Internet System Technology 21/ Garbage Collection Master Operation  Garbage Collection in GFS  Garbage Collection + Delete Process  Delete Process ① User Deletes a file ② Master renames or hides the file ③ During masters regular scan, removes the file  Regular Garbage Collection ① Receives regular Heart Beat Message ② Compare data with master metadata ③ Remove orphaned chunks Garbage Collection Process

2014 Internet System Technology 22/ Stale Replica Detection Master Operation  Stale Data Created  When mutation data is missed  Server is down  Master manages Chunk Version Number  Distinguish between up-to-date and stale  Stale Chunk Removed in Regular GC

2014 Internet System Technology 23/27 FAULT TOLERANCE IV 4.1 High Availability 4.2 Data Integrity

2014 Internet System Technology 24/ High Availability Fault Tolerance  Fast Recovery  Start-up time is in seconds  Chunk Replication  Master clones replicas as needed  Master Replication Master state replicated synchronously Shadow masters for read-only For simplicity, only one master processes. Restart is fast.

2014 Internet System Technology 25/ Data Integrity Fault Tolerance  Checksumming to Detect Data Corruption  Checksums are kept in memory as well as disk.  On read error, error is reported to master.  Master will re-replicate the chunk.  Requestor read from other replicas.

Conclusion V V

2014 Internet System Technology 27/27 5 Conclusion Conclusion  Supports Large-scale Data Processing Workloads on Commodity Hardware.  Provides Fault Tolerance  By constant monitoring  By replicating crucial data  By fast and automatic recovery  Delivers High Throughput to Concurrent Clients