Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.

Similar presentations


Presentation on theme: "The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens."— Presentation transcript:

1 The Google File System

2 Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens of) thousands of disk drives Hardware fails –Need to recover rapidly –Downtime is not acceptable Most files are read or append only –Do not need to optimize for random access writes Need a distributed file system capable of storing lots of huge files that works with commonplace hardware failures

3 Assumptions Design assumptions –System built from many inexpensive parts –Stores a “modest” number of files (a few million) –Each file is typically 100MB or larger –Expect large streaming reads and small random reads and large sequential writes –Need to support multiple machines concurrently appending to a single file –High bandwidth more important than low latency

4 Interface Extended typical file system interface –Normal operations Create, delete, open, close, read, write –Extensions Snapshot: efficiently make a copy of a file or directory tree Record append: allows multiple clients to append to the same file

5 Architecture GFS consists of a number of File System Clusters –Cluster includes one master server and several chunk servers –Files divided into fixed-size chunks 64 bit globally unique chunk handle Default of 3 replicas of each chunk –Master server has metadata (namespace, access control, mapping of files to chunks, location of chunks) –Master performs garbage collection of chunks, chunk migration between servers –Master periodically communicates a heartbeat to each chunk server

6 How it works Client asks master for chunk servers to contact –Cache this information for a limited time –Directly interact with chunk servers during that time –Example: a read operation Client asks master for filename and chunk index Master responds with chunk handle and location of replicas Client caches this information and locates one of the replicas (likely the closest) –Further reads in a chunk require no interaction with master unless cached information times out –Clients can ask for multiple chunks at once, thus limiting communication with master. Chunk size –64MB chunks

7 Fault Tolerance Fast Recovery –Master and chunk servers do not distinguish between normal and abnormal shutdown Assumes “kill -9” is a common operation Servers can restart in seconds –Chunks replicated on different chunk servers on different physical racks –Master state is replicated on other machine –If master fails It restarts immediately If not (hardware error), then another master takes over

8 Fault Tolerance Experience –When a chunk server dies its chunks are underreplicated –Killed a chunk server with 15,000 chunks containing 600GB of data All chunks were restored in 23.2 minutes –Killed two (duplicate) chunk servers each with 16,000 chunks and 660GB of data Since data was down to one copy, replication was high priority All chunks had at least 2 copies in 2 minutes

9 Summary Context –Google has lots of data –Hardware fails –Most files are read or append only Google File System –Each file is typically 100MB or larger –GFS consists of a number of File System Clusters Cluster includes one master server and several chunk servers Designed for hardware and software errors –File system process expects to be killed –Replication built into file system


Download ppt "The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens."

Similar presentations


Ads by Google