Presentation is loading. Please wait.

Presentation is loading. Please wait.

Facebook (stylized facebook) is a Social Networking System and website launched in February 2004, operated and privately owned by Facebook, Inc. As.

Similar presentations


Presentation on theme: "Facebook (stylized facebook) is a Social Networking System and website launched in February 2004, operated and privately owned by Facebook, Inc. As."— Presentation transcript:

1

2

3 Facebook (stylized facebook) is a Social Networking System and website launched in February 2004, operated and privately owned by Facebook, Inc. As of January 2011, Facebook has more than 600 million active users. Users may create a personal profile, add other users as friends, and exchange messages, including automatic notifications when they update their profile...

4

5 Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google's MapReduce and Google File System (GFS) papers. Hadoop is a top-level Apache project being built and used by a global community of contributors using the Java programming language. Yahoo! has been the largest contributor to the project, and uses Hadoop extensively across its businesses.

6

7 Hadoop was created by Doug Cutting, who named it after his son's toy elephant. It was originally developed to support distribution for the Nutch search engine project.

8

9 Hive is a data warehouse infrastructure built on top of Hadoop. It provides tools to enable easy data “ETL”, a mechanism to put structures on the data, and the capability to querying and analysis of large data sets stored in Hadoop files. Hive defines a simple SQL-like query language, called QL, that enables users familiar with SQL to query the data. At the same time, this language also allows programmers who are familiar with the MapReduce framework to be able to plug in their custom mappers and reducers to perform more sophisticated analysis that may not be supported by the built-in capabilities of the language.

10

11

12 The MySQL Federated storage engine for the MySQL relational database management system is a storage engine which allows a user to create a table that is a local representation of a foreign (remote) table. It utilizes the MySQL client library API as a data transport, treating the remote data source the same way other storage engines treat local data sources whether they be MYD files (MyISAM), memory (Cluster, Heap), or tablespace (InnoDB). Each Federated table that is defined there is one.frm (data definition file containing information such as the URL of the data source). The actual data can exist on a local or remote MySQL instance.

13 Oracle Real Application Clusters (RAC) is an option for the Oracle Database software produced by Oracle Corporation and introduced in 2001 with Oracle9i that provides software for clustering and high availability in Oracle database environments. Oracle RAC allows multiple computers to run Oracle RDBMS software simultaneously while accessing a single database, thus providing a clustered database.

14 An example of Hadoop/Hive Advantages, that has been told by a facebook stakeholder : When we started at Facebook in 2007 all of the data processing infrastructure was built around a data warehouse built using a commercial RDBMS. The data that we were generating was growing very fast – as an example we grew from a 15TB data set in 2007 to a 2PB data set today. The infrastructure at that time was so inadequate that some daily data processing jobs were taking more than a day to process and the situation was just getting worse with every passing day!

15 We had an urgent need for infrastructure that could scale along with our data and it was at that time we then started exploring Hadoop as a way to address our scaling needs. [The] Hive/Hadoop cluster at Facebook stores more than 2PB of uncompressed data and routinely loads 15 TB of data daily

16 1 - Dhruba Borthakur Zheng ShaoDhruba Borthakur Zheng Shao Presented at Hadoop World, New York -October 2, 2009 2 - http://www.infoq.com/news/2010/07/facebook-hadoop-summithttp://www.infoq.com/news/2010/07/facebook-hadoop-summit 3 - http://hadoop.apache.org/http://hadoop.apache.org/ 4 - download.oracle.com/docs/cd/E15523_01/web.1111/e13737/oracle_rac.htm download.oracle.com/docs/cd/E15523_01/web.1111/e13737/oracle_rac.htm 5 - http://structureddata.org/2009/06/10/facebook-hive-a-petabyte- scale-data-warehouse-using-hadoop/ http://structureddata.org/2009/06/10/facebook-hive-a-petabyte- scale-data-warehouse-using-hadoop/

17


Download ppt "Facebook (stylized facebook) is a Social Networking System and website launched in February 2004, operated and privately owned by Facebook, Inc. As."

Similar presentations


Ads by Google