Presentation is loading. Please wait.

Presentation is loading. Please wait.

2014. 3. 24 Presenter: Seikwon KAIST The Google File System 【 Ghemawat, Gobioff, Leung 】

Similar presentations


Presentation on theme: "2014. 3. 24 Presenter: Seikwon KAIST The Google File System 【 Ghemawat, Gobioff, Leung 】"— Presentation transcript:

1 2014. 3. 24 Presenter: Seikwon Kim @ KAIST The Google File System 【 Ghemawat, Gobioff, Leung 】

2 Contents Design Overview Ⅰ Ⅰ System Interactions Ⅱ Ⅱ Introduction Fault Tolerance IV Master Operation Ⅲ Ⅲ Conclusion V V

3 2014 Internet System Technology 3/27 【 Introduction 】  What is GFS? - Distributed file system - Goal: performance, scalability, reliability, availability  Why GFS? - To meet Google’s data processing needs - Different in design assumptions Component failures are the norm Files are huge No data overwriting Co-designing the client and the file system

4 Design Overview Ⅰ Ⅰ 1.1 Assumptions 1.2 Interface 1.3 Architecture 1.3 Chunk 1.4 Metadata 1.5 Consistency Model

5 2014 Internet System Technology 5/27 1.1 Assumptions Design Overview  Cheap Component System  Optimization for General Sized Files  Workloads - Large stream reads - Small random reads - Large sequential appends  High bandwidth > Low latency Basic Assumptions

6 2014 Internet System Technology 6/27 1.2 Interface  create  delete  open  close  read  write  snapshot : Copy file or directory  record append : Multiple clients to append data to the same file concurrently  POSIX-like APIs but not POSIX APIs  Files are organized hierarchically in directories. Features Design Overview

7 2014 Internet System Technology 7/27 1.3 Architecture  Single Master, Multiple Chunk Servers, Multiple Clients  Files are divided into 64MB chunks  Chunks are replicated in multiple chunk servers : Default 3  Master communicates with chunk servers with Heart Beat msg.  Master maintains all file system metadata  No file cache, but Yes metadata cache Architecture Overview Design Overview

8 2014 Internet System Technology 8/27 1.4 Chunk  Unit of data stored in GFS  Large chunk size is key design parameter : 64MB  Chunk replica is stored as a plain Linux file on a chunk server Design Overview  Minimize interaction between client and master  Reduce network overhead  Reduce metadata size stored in master. Pros of Large Chunk Size  Chunk server become hot spot on one chunk Cons of Large Chunk Size

9 2014 Internet System Technology 9/27 1.5 Metadata  Metadata Types:  File and chunk namespaces : Persistent  Mapping from files to chunks : Persistent  Locations of chunk replicas : Not Persistent  All metadata is in memory. Design Overview  Fast Master Operations  Efficient to periodically scan through entire state background  Chunk garbage collection  Re-replication  Chunk migration  Low cost of adding Extra Memory Why Stored in Memory?

10 2014 Internet System Technology 10/27 1.5 Metadata Cont. Design Overview  Chunk location is not persistent  Polls from chunk server at Master start up  Keep up-to-date by Heart Beat Message Chunk locations  Historical records of critical metadata changes  Persistent record of metadata changes  Logical timeline of concurrent operations  If an error occurs in master, it recovers by replaying the operation logs  Checkpoints to minimize the operation log Operation Logs

11 2014 Internet System Technology 11/27 1.6 Consistency Model Design Overview  Relaxed Consistency Model  Guarantees the Atomic File-namespace Mutation  Levels of Consistency on Data  Inconsistent : Different client see different data  Consistent : All clients see same data, regardless of replica  Defined : Client sees complete written data  Append rather then overwrite  Self Validation What Application has to do

12 System Interactions Ⅱ Ⅱ 2.1 Write Control 2.2 Data Flow 2.3 Atomic Record Append 2.4 Snapshot

13 2014 Internet System Technology 13/27 Write Process 2.1 Write Control System Interactions Lease Primary Chunk Replica that is Granted by Master Mutation Operation that Changes the Content or Metadata What Application has to do

14 2014 Internet System Technology 14/27 2.2 Data Flow System Interactions  Data is pushed linearly along a chain of chunk servers  Forwards the data to the closest machine  Distance - estimated from IP addresses  Line topology  Full outbound bandwidth  Pipelining: to minimize latency and maximize throughput  Elapsed time for transferring  Elapsed time = B/T + RL B : bytes for transfer T : network throughput R : # of replicas L : latency Network Construction in GFS

15 2014 Internet System Technology 15/27 2.3 Atomic Record Appends System Interactions  In Traditional Writes  Clients specifies offset where the data to be written Data fragmentation  In Record Append  Client specifies only the data  Similar to write in GFS  Much like write in GFS  GFS appends data to the file at least once atomically  The chunk is padded - when (record > maximum size)  Retry append when error occurs Record Append Process

16 2014 Internet System Technology 16/27 2.4 Snapshot System Interactions  Instant File/Directory Copy  Master receive snapshot request  Revokes leases on chunks  Master logs the operation  Duplicate the metadata for file/directory  New snapshot  Duplicate local chunk when write operation comes Snapshot Process

17 2014 Internet System Technology 17/27 Master Operation Ⅲ Ⅲ 3.1 Namespace Management 3.2 Replica Placement 3.3 Creation, Re-replication, Rebalancing 3.4 Garbage Collection 3.5 Stale Relica Detection

18 2014 Internet System Technology 18/27 3.1 Namespace Management Master Operation  Allows Multiple Operations at Same Time  Master Operations Need Lock  Prefix Compression  Snapshot of /home/user read-lock on /home read-lock on /home/user write-lock on /home/user  File creation for /home/user/foo read-lock on /home read-lock on /home/user write-lock on /home/user/foo  Locks conflicts  Serialize operations Locking Example

19 2014 Internet System Technology 19/27 3.2 Replica Placement Master Operation  Purpose of Replica Placement  Maximize data reliability and availability  Maximize bandwidth utilization  Spread Chunks across Rack  Available even when power circuit problem occurs Rack

20 2014 Internet System Technology 20/27 3.3 Creation, Re-replication, Rebalancing Master Operation  Movements for Chunk Replicas  Chunk creation  Re-replication  Load balancing  Creation  Place chunk at below-average-disk-space chunk server  Spread replicas across racks  Re-replication  Re-replicates a chunk when criteria falls below specified level  Rebalancing  Periodically examines for load balancing.

21 2014 Internet System Technology 21/27 3.4 Garbage Collection Master Operation  Garbage Collection in GFS  Garbage Collection + Delete Process  Delete Process ① User Deletes a file ② Master renames or hides the file ③ During masters regular scan, removes the file  Regular Garbage Collection ① Receives regular Heart Beat Message ② Compare data with master metadata ③ Remove orphaned chunks Garbage Collection Process

22 2014 Internet System Technology 22/27 3.5 Stale Replica Detection Master Operation  Stale Data Created  When mutation data is missed  Server is down  Master manages Chunk Version Number  Distinguish between up-to-date and stale  Stale Chunk Removed in Regular GC

23 2014 Internet System Technology 23/27 FAULT TOLERANCE IV 4.1 High Availability 4.2 Data Integrity

24 2014 Internet System Technology 24/27 4.1 High Availability Fault Tolerance  Fast Recovery  Start-up time is in seconds  Chunk Replication  Master clones replicas as needed  Master Replication Master state replicated synchronously Shadow masters for read-only For simplicity, only one master processes. Restart is fast.

25 2014 Internet System Technology 25/27 4.2 Data Integrity Fault Tolerance  Checksumming to Detect Data Corruption  Checksums are kept in memory as well as disk.  On read error, error is reported to master.  Master will re-replicate the chunk.  Requestor read from other replicas.

26 Conclusion V V

27 2014 Internet System Technology 27/27 5 Conclusion Conclusion  Supports Large-scale Data Processing Workloads on Commodity Hardware.  Provides Fault Tolerance  By constant monitoring  By replicating crucial data  By fast and automatic recovery  Delivers High Throughput to Concurrent Clients


Download ppt "2014. 3. 24 Presenter: Seikwon KAIST The Google File System 【 Ghemawat, Gobioff, Leung 】"

Similar presentations


Ads by Google