B ig D ata Analysis for Page Ranking using Map/Reduce R.Renuka, R.Vidhya Priya, III B.Sc., IT, The S.F.R.College for Women, Sivakasi.

Slides:



Advertisements
Similar presentations
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Advertisements

Big Data Training Course for IT Professionals Name of course : Big Data Developer Course Duration : 3 days full time including practical sessions Dates.
Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
LIBRA: Lightweight Data Skew Mitigation in MapReduce
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Big Data: Analytics Platforms Donald Kossmann Systems Group, ETH Zurich 1.
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
ETM Hadoop. ETM IDC estimate put the size of the “digital universe” at zettabytes in forecasting a tenfold growth by 2011 to.
PARALLEL DBMS VS MAP REDUCE “MapReduce and parallel DBMSs: friends or foes?” Stonebraker, Daniel Abadi, David J Dewitt et al.
Big Data A big step towards innovation, competition and productivity.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
Ch 4. The Evolution of Analytic Scalability
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
MapReduce April 2012 Extract from various presentations: Sudarshan, Chungnam, Teradata Aster, …
H ADOOP DB: A N A RCHITECTURAL H YBRID OF M AP R EDUCE AND DBMS T ECHNOLOGIES FOR A NALYTICAL W ORKLOADS By: Muhammad Mudassar MS-IT-8 1.
Zois Vasileios Α. Μ :4183 University of Patras Department of Computer Engineering & Informatics Diploma Thesis.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
Penwell Debug Intel Confidential BRIEF OVERVIEW OF HIVE Jonathan Brauer ESE 380L Feb
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers.
Apache Hadoop MapReduce What is it ? Why use it ? How does it work Some examples Big users.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
Cloud Distributed Computing Platform 2 Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
HadoopDB Presenters: Serva rashidyan Somaie shahrokhi Aida parbale Spring 2012 azad university of sanandaj 1.
CSE 548 Advanced Computer Network Security Document Search in MobiCloud using Hadoop Framework Sayan Cole Jaya Chakladar Group No: 1.
Hadoop Ali Sharza Khan High Performance Computing 1.
Bleeding edge technology to transform Data into Knowledge HADOOP In pioneer days they used oxen for heavy pulling, and when one ox couldn’t budge a log,
Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters Hung-chih Yang(Yahoo!), Ali Dasdan(Yahoo!), Ruey-Lung Hsiao(UCLA), D. Stott Parker(UCLA)
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Hidemoto Nakada, Hirotaka Ogawa and Tomohiro Kudoh National Institute of Advanced Industrial Science and Technology, Umezono, Tsukuba, Ibaraki ,
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface.
Database Applications (15-415) Part II- Hadoop Lecture 26, April 21, 2015 Mohammad Hammoud.
Hadoop IT Services Hadoop Users Forum CERN October 7 th,2015 CERN IT-D*
MapReduce and NoSQL CMSC 461 Michael Wilson. Big data  The term big data has become fairly popular as of late  There is a need to store vast quantities.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
IBM Research ® © 2007 IBM Corporation Introduction to Map-Reduce and Join Processing.
Web Log Data Analytics with Hadoop
CPS 216: Advanced Database Systems Shivnath Babu.
NoSQL Systems Motivation. NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis.
HADOOP Carson Gallimore, Chris Zingraf, Jonathan Light.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
This is a free Course Available on Hadoop-Skills.com.
An Introduction To Big Data For The SQL Server DBA.
BIG DATA. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database.
What is it and why it matters? Hadoop. What Is Hadoop? Hadoop is an open-source software framework for storing data and running applications on clusters.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Bleeding edge technology to transform Data into Knowledge HADOOP In pioneer days they used oxen for heavy pulling, and when one ox couldn’t budge a log,
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
Big Data & Test Automation
”Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters” Published In SIGMOD '07 By Yahoo! Senthil Nathan N IIT Bombay.
INTRODUCTION TO BIGDATA & HADOOP
Ministry of Higher Education
Database Applications (15-415) Hadoop Lecture 26, April 19, 2016
Tools for Processing Big Data Jinan Al Aridhee and Christian Bach
Ch 4. The Evolution of Analytic Scalability
CS639: Data Management for Data Science
Big DATA.
Presentation transcript:

B ig D ata Analysis for Page Ranking using Map/Reduce R.Renuka, R.Vidhya Priya, III B.Sc., IT, The S.F.R.College for Women, Sivakasi.

Overview Introduction What is Big Data! Why Big Data? 4 V’s Of Big Data Big Data Analytics Technologies Map/Reduce Applications Case Study Conclusion

Introduction Data have outgrown the storage and processing capabilities of a single host. Two fundamental challenges: –how to store and –how to work with voluminous data sizes, and, –how to understand data and turn it into a competitive advantage.

What is Big Data! ‘Big-data’ is similar to ‘Small-data’, but bigger But having data bigger requires different approaches: techniques, tools & architectures To solve: New problems and old problems in a better way.

The Blind men and the Elephant

Why Big Data? Key enablers for the growth of “Big Data” are: Increase of Processing Power Increase of Storage Capacities Availability of Data

4 V’s of Big Data

Big Data Analytics Technologies Hadoop PLATFORA WibiData PIG Hive MapReduce NoSQL databases Column-oriented databases

Hadoop Hadoop is a distributed file system and data processing engine Hadoop has two components: –The Hadoop distributed file system (HDFS) –The MapReduce programing.

Map / Reduce A High level abstracted framework for distributed processing of large datasets Fault Tolerant, Parallelization Computation consists of two phases Map Reduce A Master-Slave architecture Computations occurs in multiple slave nodes And it tries to provide data locality as much as possible.

MR model Map –Process a key/value pair to generate intermediate key/value pairs Reduce –Merge all intermediate values associated with the same key Users implement interface of two primary methods: 1. Map: (key1, val1) → (key2, val2) 2. Reduce: (key2, [val2]) → [val3]

Applications

Homeland Security FinanceSmarter Healthcare Multi-channel sales Telecom Manufacturing Traffic Control Trading AnalyticsFraud and Risk Log Analysis Search Quality Retails

Case Study

Conclusion Real-time big data isn’t just a process for storing petabytes or exabytes of data in a data warehouse, It’s about the ability to make better decisions and take meaningful actions at the right time.

Queries ??