Facebook (stylized facebook) is a Social Networking System and website launched in February 2004, operated and privately owned by Facebook, Inc. As.

Slides:



Advertisements
Similar presentations
Dan Bassett, Jonathan Canfield December 13, 2011.
Advertisements

 Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware  Created by Doug Cutting and.
By: Chris Hayes. Facebook Today, Facebook is the most commonly used social networking site for people to connect with one another online. People of all.
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
ETM Hadoop. ETM IDC estimate put the size of the “digital universe” at zettabytes in forecasting a tenfold growth by 2011 to.
Apache Hadoop and Hive Dhruba Borthakur Apache Hadoop Developer
Hadoop Ecosystem Overview
Introduction to Apache Hadoop CSCI 572: Information Retrieval and Search Engines Summer 2010.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Hive: A data warehouse on Hadoop Based on Facebook Team’s paperon Facebook Team’s paper 8/18/20151.
HADOOP ADMIN: Session -2
Data Mining on the Web via Cloud Computing COMS E6125 Web Enhanced Information Management Presented By Hemanth Murthy.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
What makes Facebook do what it does? By Gavin Mais.
SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.
Cloud Distributed Computing Environment Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Information Systems Today (©2006 Prentice Hall) MySQL 1CS3754 Class Note #8, Is an open-source relational database management system 2.Is fast and.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
Penwell Debug Intel Confidential BRIEF OVERVIEW OF HIVE Jonathan Brauer ESE 380L Feb
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
SEMINAR ON Guided by: Prof. D.V.Chaudhari Seminar by: Namrata Sakhare Roll No: 65 B.E.Comp.
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
Amazon Web Services BY, RAJESH KANDEPU. Introduction  Amazon Web Services is a collection of remote computing services that together make up a cloud.
O’Reilly – Hadoop: The Definitive Guide Ch.1 Meet Hadoop May 28 th, 2010 Taewhi Lee.
Communicate with All Workers Involved in the Process of Delivering High-Quality Health Care by Choosing Dossier365 on the Azure Platform MICROSOFT AZURE.
Performance Evaluation on Hadoop Hbase By Abhinav Gopisetty Manish Kantamneni.
Spatial Tajo Supporting Spatial Queries on Apache Tajo Slideshare Shorten URL : goo.gl/j0VLXpgoo.gl/j0VLXp.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
 2009 Calpont Corporation 1 Calpont Open Source Columnar Storage Engine for Scalable MySQL Data Warehousing April 22, 2009 MySQL User Conference Santa.
Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface.
Presented by: Katie Woods and Jordan Howell. * Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of.
Programming in Hadoop Guangda HU Huayang GUO
Windows Azure. Azure Application platform for the public cloud. Windows Azure is an operating system You can: – build a web application that runs.
Hadoop implementation of MapReduce computational model Ján Vaňo.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
MySQL An Introduction Databases 101.
HADOOP Carson Gallimore, Chris Zingraf, Jonathan Light.
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
Cloud Distributed Computing Environment Hadoop. Hadoop is an open-source software system that provides a distributed computing environment on cloud (data.
This is a free Course Available on Hadoop-Skills.com.
BIG DATA/ Hadoop Interview Questions.
CPSC8985 FA 2015 Team C3 DATA MIGRATION FROM RDBMS TO HADOOP By Naga Sruthi Tiyyagura Monika RallabandiRadhakrishna Nalluri.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Hadoop Javad Azimi May What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data. It includes:
Big Data is a Big Deal!.
Hadoop Aakash Kag What Why How 1.
INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER
Dhruba Borthakur Apache Hadoop Developer Facebook Data Infrastructure
Spark Presentation.
Hadoopla: Microsoft and the Hadoop Ecosystem
The Improvement of PaaS Platform ZENG Shu-Qing, Xu Jie-Bin 2010 First International Conference on Networking and Distributed Computing SQUARE.
PHP / MySQL Introduction
Hadoop Clusters Tess Fulkerson.
Central Florida Business Intelligence User Group
Introduction to PIG, HIVE, HBASE & ZOOKEEPER
Azure's Performance, Scalability, SQL Servers Automate Real Time Data Transfer at Low Cost MINI-CASE STUDY “Azure offers high performance, scalable, and.
Introduction to Apache
Appcelerator Arrow: Build APIs in Minutes. Connect to Any Data Source
Overview of big data tools
TIM TAYLOR AND JOSH NEEDHAM
Zoie Barrett and Brian Lam
Charles Tappert Seidenberg School of CSIS, Pace University
Cloud Computing for Data Analysis Pig|Hive|Hbase|Zookeeper
Big DATA.
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
Presentation transcript:

Facebook (stylized facebook) is a Social Networking System and website launched in February 2004, operated and privately owned by Facebook, Inc. As of January 2011, Facebook has more than 600 million active users. Users may create a personal profile, add other users as friends, and exchange messages, including automatic notifications when they update their profile...

Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google's MapReduce and Google File System (GFS) papers. Hadoop is a top-level Apache project being built and used by a global community of contributors using the Java programming language. Yahoo! has been the largest contributor to the project, and uses Hadoop extensively across its businesses.

Hadoop was created by Doug Cutting, who named it after his son's toy elephant. It was originally developed to support distribution for the Nutch search engine project.

Hive is a data warehouse infrastructure built on top of Hadoop. It provides tools to enable easy data “ETL”, a mechanism to put structures on the data, and the capability to querying and analysis of large data sets stored in Hadoop files. Hive defines a simple SQL-like query language, called QL, that enables users familiar with SQL to query the data. At the same time, this language also allows programmers who are familiar with the MapReduce framework to be able to plug in their custom mappers and reducers to perform more sophisticated analysis that may not be supported by the built-in capabilities of the language.

The MySQL Federated storage engine for the MySQL relational database management system is a storage engine which allows a user to create a table that is a local representation of a foreign (remote) table. It utilizes the MySQL client library API as a data transport, treating the remote data source the same way other storage engines treat local data sources whether they be MYD files (MyISAM), memory (Cluster, Heap), or tablespace (InnoDB). Each Federated table that is defined there is one.frm (data definition file containing information such as the URL of the data source). The actual data can exist on a local or remote MySQL instance.

Oracle Real Application Clusters (RAC) is an option for the Oracle Database software produced by Oracle Corporation and introduced in 2001 with Oracle9i that provides software for clustering and high availability in Oracle database environments. Oracle RAC allows multiple computers to run Oracle RDBMS software simultaneously while accessing a single database, thus providing a clustered database.

An example of Hadoop/Hive Advantages, that has been told by a facebook stakeholder : When we started at Facebook in 2007 all of the data processing infrastructure was built around a data warehouse built using a commercial RDBMS. The data that we were generating was growing very fast – as an example we grew from a 15TB data set in 2007 to a 2PB data set today. The infrastructure at that time was so inadequate that some daily data processing jobs were taking more than a day to process and the situation was just getting worse with every passing day!

We had an urgent need for infrastructure that could scale along with our data and it was at that time we then started exploring Hadoop as a way to address our scaling needs. [The] Hive/Hadoop cluster at Facebook stores more than 2PB of uncompressed data and routinely loads 15 TB of data daily

1 - Dhruba Borthakur Zheng ShaoDhruba Borthakur Zheng Shao Presented at Hadoop World, New York -October 2, download.oracle.com/docs/cd/E15523_01/web.1111/e13737/oracle_rac.htm download.oracle.com/docs/cd/E15523_01/web.1111/e13737/oracle_rac.htm scale-data-warehouse-using-hadoop/ scale-data-warehouse-using-hadoop/