Presentation is loading. Please wait.

Presentation is loading. Please wait.

CREATED BY: JEAN LOIZIN CLASS: CS 345 DATE: 12/05/2016

Similar presentations


Presentation on theme: "CREATED BY: JEAN LOIZIN CLASS: CS 345 DATE: 12/05/2016"— Presentation transcript:

1 CREATED BY: JEAN LOIZIN CLASS: CS 345 DATE: 12/05/2016
THE GOOGLE FILE SYSTEM CREATED BY: JEAN LOIZIN CLASS: CS 345 DATE: 12/05/2016

2 Brief history about Google
Google is a well known search engine. Google was originally known as BackRub when first created, and later was given the name of Google. The search engine started as a research project at Stanford University.

3 The project was designed to find files on the internet.
It’s development started in 1996 by Sergey Brin and Larry page.

4 Shortly a year after, they registered the domain on September 15 1997.
While the actual company was later created on September 4,1998

5 Since Google was built to find files on the internet.
The company had to find a way to organize the internet files and make it as easy as possible for searching as well as storing. Within which we discovered THE GOOGLE FILE SYSTEM

6 THE GOOGLE FILE SYSTEM The Google File System (GFS ) was designed to handle the large amount of demands that google needed to process. The file system is known as a storage platform. Which allows google to manage and store data more efficiently. The File System as designed in early 2002, to support searching as well as web-crawling.

7 Architecture Google File System is made of clusters
Each cluster is typically consists of a single master, multiple chunkservers and multiple clients. Each file within the Cluster, is divided into fixed-size chunks of 64MB. And each Chunks travels over the chunkserves. Each Chunk is identified by a unique 64-bit chunk handle. Each chunk is copied at least 3 times to increase reliability of the system.

8 GFS Architecture diagram

9 System Interaction Lets see how the control unit of the file system handles the control flow of writing files to the system. Application sends the file name or data to the system.

10 2) File System sends the file name and chunk index to master.
3) Master sends the identify to the client, client receives information and stores it their cache.

11 4) With data already in the cache, the client resends the data , which improves the performance, and GFS separate data flow, and store the data. 5) client sends write request to the primary, and primary decides and applies the mutation order to local copy. 6) Primary sends the write request to all the secondary. 7) after completing the operation, secondary acknowledge primary. 8) Primary replies to client about completing the operation, in case of errors.

12 Now, lets see how lets see how the file system handles file reading.
1) application give the file name to the GFS client. 2) client passes the file name and chunk index master 3) Master sends chunk handle and copy of the location to the client 4) and the client able to view the data.

13 Reliability of the System
The system is designed with hundreds of servers, but sometimes, they are bound to be unavailable at a given time. In order to keep the system available at all times, whether a server is present or not, The File System uses two strategies : Fast recovery Replication

14 Fast Recovery Both the master and chunkserver are designed to restore their state and start in seconds no matter how they terminated. Servers do not know normal or abnormal termination as they are routinely shut down just by killing the process. When that happens, the master and the chunkserver takes over to ensure reliability.

15 Chunk/Master Replication
As we mentioned earlier in the slides, each chunks gets copied a number of 3 times on multiple chunkservers on different racks. User can specify different copy levels for different parts of the files names. The Master states is replicated for reliability. A mutation to the state is considered committed only after its log record has been flushed to disk locally and on all master replicas.

16 Conclusion(s) The GFS demonstrates the qualities for supporting large scale data processing. The Systems delivers high aggregate throughput to many concurrent readers and writers performing a variety of tasks. The file system was successfully designed to met the storage needs and is widely used within google as the storage platform for research and development as well as production data processing.

17 Cited Computer hope, ogle.htm , copyright 2016 Google-File-System, system.wikispaces.asu.edu/ , The Google File System by Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. Pdf format, available on course website.


Download ppt "CREATED BY: JEAN LOIZIN CLASS: CS 345 DATE: 12/05/2016"

Similar presentations


Ads by Google