Apache Hadoop MapReduce What is it ? Why use it ? How does it work Some examples Big users.

Slides:



Advertisements
Similar presentations
 Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware  Created by Doug Cutting and.
Advertisements

Based on the text by Jimmy Lin and Chris Dryer; and on the yahoo tutorial on mapreduce at index.html
Developing a MapReduce Application – packet dissection.
Map/Reduce in Practice Hadoop, Hbase, MongoDB, Accumulo, and related Map/Reduce- enabled data stores.
CMU SCS : Multimedia Databases and Data Mining Extra: intro to hadoop C. Faloutsos.
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
CPS216: Advanced Database Systems (Data-intensive Computing Systems) How MapReduce Works (in Hadoop) Shivnath Babu.
GROUP 7 TOOLS FOR BIG DATA Sandeep Prasad Dipojjwal Ray.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
Hadoop, Hadoop, Hadoop!!! Jerome Mitchell Indiana University.
Big Data Analytics with R and Hadoop
HADOOP ADMIN: Session -2
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.
Cloud Distributed Computing Environment Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
MapReduce April 2012 Extract from various presentations: Sudarshan, Chungnam, Teradata Aster, …
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Whirlwind tour of Hadoop Inspired by Google's GFS Clusters from systems Batch Processing High Throughput Partition-able problems Fault Tolerance.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
SEMINAR ON Guided by: Prof. D.V.Chaudhari Seminar by: Namrata Sakhare Roll No: 65 B.E.Comp.
Cloud Distributed Computing Platform 2 Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
HAMS Technologies 1
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
Whirlwind Tour of Hadoop Edward Capriolo Rev 2. Whirlwind tour of Hadoop Inspired by Google's GFS Clusters from systems Batch Processing High.
Hadoop Ali Sharza Khan High Performance Computing 1.
CPS216: Advanced Database Systems (Data-intensive Computing Systems) Introduction to MapReduce and Hadoop Shivnath Babu.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Introduction to Search Engines Technology CS Technion, Winter 2013 Amit Gross Some slides are courtesy of: Edward Bortnikov & Ronny Lempel, Yahoo!
Database Applications (15-415) Part II- Hadoop Lecture 26, April 21, 2015 Mohammad Hammoud.
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA
Apache Hadoop Daniel Lust, Anthony Taliercio. What is Apache Hadoop? Allows applications to utilize thousands of nodes while exchanging thousands of terabytes.
Presented by: Katie Woods and Jordan Howell. * Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of.
Hadoop implementation of MapReduce computational model Ján Vaňo.
Map-Reduce Big Data, Map-Reduce, Apache Hadoop SoftUni Team Technical Trainers Software University
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
MapReduce Computer Engineering Department Distributed Systems Course Assoc. Prof. Dr. Ahmet Sayar Kocaeli University - Fall 2015.
IBM Research ® © 2007 IBM Corporation Introduction to Map-Reduce and Join Processing.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
Cloud Distributed Computing Environment Hadoop. Hadoop is an open-source software system that provides a distributed computing environment on cloud (data.
Next Generation of Apache Hadoop MapReduce Owen
Beyond Hadoop The leading open source system for processing big data continues to evolve, but new approaches with added features are on the rise. Ibrahim.
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
By: Joel Dominic and Carroll Wongchote 4/18/2012.
1 Student Date Time Wei Li Nov 30, 2015 Monday 9:00-9:25am Shubbhi Taneja Nov 30, 2015 Monday9:25-9:50am Rodrigo Sanandan Dec 2, 2015 Wednesday9:00-9:25am.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Image taken from: slideshare
Hadoop Aakash Kag What Why How 1.
Apache hadoop & Mapreduce
Software Systems Development
An Open Source Project Commonly Used for Processing Big Data Sets
Hadoop MapReduce Framework
15-826: Multimedia Databases and Data Mining
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Ministry of Higher Education
Big Data Programming: an Introduction
Database Applications (15-415) Hadoop Lecture 26, April 19, 2016
Distributed Systems CS
Cloud Distributed Computing Environment Hadoop
Hadoop Basics.
Lecture 16 (Intro to MapReduce and Hadoop)
Distributed Systems CS
Presentation transcript:

Apache Hadoop MapReduce What is it ? Why use it ? How does it work Some examples Big users

MapReduce – What is it ? Processing engine of Hadoop Developers create Map and Reduce jobs Used for big data batch processing Parallel processing of huge data volumes Fault tolerant Scalable

MapReduce – Why use it ? Your data in Terabyte / Petabyte range You have huge I/O Hadoop framework takes care of  Job and task management  Failures  Storage  Replication You just write Map and Reduce jobs

MapReduce – How does it work ? Take word counting as an example, something that G oogle does all of the time.

MapReduce – How does it work ? Input data split into shards Split data mapped to key,value pairs i.e. Bear,1 Mapped data shuffled/sorted by key i.e. Bear Sorted data reduced i.e. Bear, 2 Final data stored on HDFS There might be extra map layer before shuffle JobTracker controls all tasks in job TaskTracker controls map and reduce

MapReduce - Some examples A visual example with colours to show you the cycle Split -> Map -> Shuffle -> Reduce

MapReduce - Some examples A visual example of MapReduce with job and task trackers added to individual map and reduce jobs.

Hadoop MapReduce – Big users Users  Facebook  Yahoo  Amazon  Ebay