The Google File System Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung

Slides:



Advertisements
Similar presentations
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 Presented by Wenhao Xu University of British Columbia.
Advertisements

Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung
The google file system Cs 595 Lecture 9.
THE GOOGLE FILE SYSTEM CS 595 LECTURE 8 3/2/2015.
G O O G L E F I L E S Y S T E M 陳 仕融 黃 振凱 林 佑恩 Z 1.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google Jaehyun Han 1.
The Google File System Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani 1CS5204 – Operating Systems.
GFS: The Google File System Brad Karp UCL Computer Science CS Z03 / th October, 2006.
The Google File System (GFS). Introduction Special Assumptions Consistency Model System Design System Interactions Fault Tolerance (Results)
Google File System 1Arun Sundaram – Operating Systems.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
The Google File System and Map Reduce. The Team Pat Crane Tyler Flaherty Paul Gibler Aaron Holroyd Katy Levinson Rob Martin Pat McAnneny Konstantin Naryshkin.
1 The File System Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung (Google)
GFS: The Google File System Michael Siegenthaler Cornell Computer Science CS th March 2009.
Large Scale Sharing GFS and PAST Mahesh Balakrishnan.
The Google File System.
Google File System.
Northwestern University 2007 Winter – EECS 443 Advanced Operating Systems The Google File System S. Ghemawat, H. Gobioff and S-T. Leung, The Google File.
Case Study - GFS.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
The Hadoop Distributed File System: Architecture and Design by Dhruba Borthakur Presented by Bryant Yao.
1 The Google File System Reporter: You-Wei Zhang.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung
The Google File System Ghemawat, Gobioff, Leung via Kris Molendyke CSE498 WWW Search Engines LeHigh University.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
The Google File System Presenter: Gladon Almeida Authors: Sanjay Ghemawat Howard Gobioff Shun-Tak Leung Year: OCT’2003 Google File System14/9/2013.
Outline for today  Administrative  Next week: Monday lecture, Friday discussion  Objective  Google File System  Paper: Award paper at SOSP in 2003.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Presenters: Rezan Amiri Sahar Delroshan
GFS : Google File System Ömer Faruk İnce Fatih University - Computer Engineering Cloud Computing
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
EE324 DISTRIBUTED SYSTEMS FALL 2015 Google File System.
Presenter: Seikwon KAIST The Google File System 【 Ghemawat, Gobioff, Leung 】
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture Chunkservers Master Consistency Model File Mutation Garbage.
Google File System Robert Nishihara. What is GFS? Distributed filesystem for large-scale distributed applications.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Lecture 24: GFS.
Google File System Sanjay Ghemwat, Howard Gobioff, Shun-Tak Leung Vijay Reddy Mara Radhika Malladi.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M th October, 2008.
Dr. Zahoor Tanoli COMSATS Attock 1.  Motivation  Assumptions  Architecture  Implementation  Current Status  Measurements  Benefits/Limitations.
1 CMPT 431© A. Fedorova Google File System A real massive distributed file system Hundreds of servers and clients –The largest cluster has >1000 storage.
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
Cloud Computing Platform as a Service The Google Filesystem
File and Storage Systems: The Google File System
Google File System.
The Google File System (GFS)
Google Filesystem Some slides taken from Alan Sussman.
Google File System CSE 454 From paper by Ghemawat, Gobioff & Leung.
Gregory Kesden, CSE-291 (Storage Systems) Fall 2017
Gregory Kesden, CSE-291 (Cloud Computing) Fall 2016
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
The Google File System (GFS)
The Google File System (GFS)
The Google File System (GFS)
The Google File System (GFS)
The Google File System (GFS)
Cloud Computing Storage Systems
THE GOOGLE FILE SYSTEM.
by Mikael Bjerga & Arne Lange
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google SOSP’03, October 19–22, 2003, New York, USA Hyeon-Gyu Lee, and Yeong-Jae.
The Google File System (GFS)
Presentation transcript:

The Google File System Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung SOSP 2003 (19th ACM Symposium on Operating Systems Principles ) July 28, 2010 Presented by Hyojin Song

Scalability of the Google Contents Introduction GFS Design Measurements Conclusion ※ reference : 구글을 지탱하는 기술 (BOOK) Cloud팀 스터디 문서자료 Scalability of the Google

Introduction(1/3) Google Data center (open in 2009) 1 containers, 1160 servers Bridge crane for container handling Using scooter for engineers Double size of soccer stadium for IDC Dramatic power efficiency

Introduction(2/3) What is File System? A method of storing and organizing computer files and their data. Be used on data storage devices such as a hard disks or CD-ROMs to maintain the physical location of the files. What is Distributed File System? Makes it possible for multiple users on multiple machines to share files and storage resources via a computer network. Transparency in Distributed Systems Make distributed system as easy to use and manage as a centralized system Give a Single-System Image A kind of network software operating as client-server system

Introduction(3/3) What is the Google File System? Motivation A scalable distributed file system for large distributed data-intensive applications. Shares many same goals as previous distributed file systems performance, scalability, reliability, availability Motivation To meet the rapidly growing demands of Google’s data processing needs. Application workloads and Technological environment Storage for a LOT of REALLY large files Across hundreds of thousands of machines Need fast access, high availability Soln: Google File System Ability to run massively parallel computation jobs SETI@home on serious steroids Each “small” job takes thousands of CPUs at a time Soln: MapReduce

Contents GFS Design Introduction Measurements Conclusion 1. Design Assumption 2. Architecture 3. Features 4. System Interactions 5. Master Operation 6. Fault Tolerance

GFS Design 1. Design Assumption Component failures are the norm A number of cheap hardware but unreliable Scale up VS scale out Problems : application bugs, operating system bugs, human errors, and the failures of disks, memory, connectors, networking, and power supplies. Solutions : constant monitoring, error detection, fault tolerance, and automatic recovery Google Server Computer

GFS Design 1. Design Assumption Files are HUGE Multi-GB file sizes are the norm Parameters for I/O operation and block sizes have to be revisited. File access model: read / append only (not overwriting) Most reads sequential No random writes, only append to end Data streams continuously generated by running application. Appending becomes the focus of performance optimization and atomicity guarantees, while caching data blocks in the client loses its appeal. Co-designing the applications and the file system API benefits the overall system Increasing the flexibility.

GFS Design 2. Architecture GFS Cluster Component 1. GFS Master 2. GFS Chunkserver 3. GFS Client

GFS Design 2. Architecture GFS Master Maintains all file system metadata. names space, access control info, file to chunk mappings, chunk (including replicas) location, etc. Periodically communicates with chunkservers in HeartBeat messages to give instructions and check state Helps make sophisticated chunk placement and replication decision, using global knowledge For reading and writing, client contacts Master to get chunk locations, then deals directly with chunkservers Master is not a bottleneck for reads/writes

GFS Design 2. Architecture GFS Chunkserver Files are broken into chunks. Each chunk has a immutable globally unique 64-bit chunk-handle. handle is assigned by the master at chunk creation Chunk size is 64 MB (fixed-size chunk) Each chunk is replicated on 3 (default) servers

GFS Design 2. Architecture GFS Client Linked to apps using the file system API. Communicates with master and chunkservers for reading and writing Master interactions only for metadata Chunkserver interactions for data Only caches metadata information Data is too large to cache.

GFS Design 3. Features Single Master Chunk Size simplify the design enables the master to make sophisticated chunk placement and replication decisions using global knowledge. needs to minimize operations to prevent bottleneck Chunk Size block size : 64MB Pros reduce interactions between client and master reduce network overhead between client and chunkserver reduce the size of the metadata stored on the master Cons small file in one chunk -> hot spot

GFS Design 3. Features Metadata Type the file and chunk namespaces the mapping from files to chunks the locations of each chunk’s replicas All metadata is kept in the master’s memory (less 64bype per 64MB chunk) For recovery, first two types are kept persistent by logging mutations to an operation log and replicated on remote machines Periodically scan through metadata’s entire state in the background. Chunk garbage collection, re-replication for fail, chunk migration for balancing

GFS Design 4. System Interactions Client Requests new file (1) Master Adds file to namespace Selects 3 chunk servers Designates chunk primary and grant lease Replies to client (2) Sends data to all replicas (3) Notifies primary when sent (4) Primary Writes data in order Increment chunk version Sequences secondary writes (5) Secondary Write data in sequence order Notify primary write finished (6) Notifies client when write finished (7) ※ Write Data

GFS Design 5. Master Operation Replica Placement Placement policy maximizes data reliability and network bandwidth Spread replicas not only across machines, but also across racks Guards against machine failures, and racks getting damaged or going offline Reads for a chunk exploit aggregate bandwidth of multiple racks Writes have to flow through multiple racks tradeoff made willingly Chunk creation created and placed by master. placed on chunkservers with below average disk utilization limit number of recent “creations” on a chunkserver with creations comes lots of writes

GFS Design 5. Master Operation Garbage collection When a client deletes a file, master logs it like other changes and changes filename to a hidden file. Master removes files hidden for longer than 3 days when scanning file system name space metadata is also erased During HeartBeat messages, the chunkservers send the master a subset of its chunks, and the master tells it which files have no metadata. Chunkserver removes these files on its own

GFS Design 6. Fault Tolerance High Availability Fast recovery Master and chunkservers can restart in seconds Chunk Replication Master Replication “shadow” masters provide read-only access when primary master is down mutations not done until recorded on all master replicas Data Integrity Chunkservers use checksums to detect corrupt data Since replicas are not bitwise identical, chunkservers maintain their own checksums For reads, chunkserver verifies checksum before sending chunk Update checksums during writes

GFS Design 6. Fault Tolerance Master Failure Operations Log Persistent record of changes to master metadata Used to replay events on failure Replicated to multiple machines for recovery Flushed to disk before responding to client Checkpoint of master state at interval to keep ops log file small Master recovery requires Latest checkpoint file Subsequent operations log file Master recovery was initially a manual operation Then automated outside of GFS to within 2 minutes Now down to 10’s of seconds

GFS Design 6. Fault Tolerance Chunk Server Failure Heartbeats sent from chunk server to master Master detects chunk server failure If chunk server goes down: Chunk replica count is decremented on master Master re-replicates missing chunks as needed 3 chunk replicas is default (may vary) Priority for chunks with lower replica counts Priority for blocked clients Throttling per cluster and chunk server No difference in normal/abnormal termination Chunk servers are routinely killed for maintenance

GFS Design 6. Fault Tolerance Chunk Corruption 32-bit checksums 64MB chunks split into 64KB blocks Each 64KB block has a 32-bit checksum Chunk server maintains checksums Checksums are optimized for appendRecord() Verified for all reads and overwrites Not verified during recordAppend() – only on next read Chunk servers verify checksums when idle If a corrupt chunk is detected: Chunk server returns an error to the client Master notified, replica count decremented Master initiates new replica creation Master tells chunk server to delete corrupted chunk

Contents Introduction GFS Design Measurements Conclusion

Measurements Micro-benchmarks GFS cluster consists of 1 master, 2 master replicas 16 chunkservers 16 clients Machines are configured with Dual 1.4 GHz PⅢ processors 2GB of RAM Two 80 GB 5400rpm disks 100Mbps full-duplex Ethernet connection to an HP 2524 switch The two switches are connected with 1 Gbps link.

Measurements Micro-benchmarks Cluster A: Cluster B: Used for research and development. Used by over a hundred engineers. Typical task initiated by user and runs for a few hours. Task reads MB’s-TB’s of data, transforms/analyzes the data, and writes results back. Cluster B: Used for production data processing. Typical task runs much longer than a Cluster A task. Continuously generates and processes multi-TB data sets. Human users rarely involved. Clusters had been running for about a week when measurements were taken.

Measurements Micro-benchmarks Many computers at each cluster (227, 342!) On average, cluster B file size is triple cluster A file size. Metadata at chunkservers: Chunk checksums. Chunk Version numbers. Metadata at master is small (48, 60 MB) -> master recovers from crash within seconds.

Measurements Micro-benchmarks Many more reads than writes. Both clusters were in the middle of heavy read activity. Cluster B was in the middle of a burst of write activity. In both clusters, master was receiving 200-500 operations per second -> master is not a bottleneck.

Measurements Micro-benchmarks Chunkserver workload Master workload Bimodal distribution of small and large files Ratio of write to append operations: 3:1 to 8:1 Virtually no overwrites Master workload Most request for chunk locations and open files Reads achieve 75% of the network limit Writes achieve 50% of the network limit

Contents Introduction GFS Design Measurements Conclusion

Conclusion GFS demonstrates how to support large-scale processing workloads on commodity hardware design to tolerate frequent component failures optimize for huge files that are mostly appended and read feel free to relax and extend FS interface as required go for simple solutions (e.g., single master) GFS2 as part of the new 2010 "Caffeine" infrastructure 1MB average file size Distributed multi-master model Designed to take full advantage of BigTable Google’s new paradigms as a front running man is noticeable