© 2012 Unisys Corporation. All rights reserved. 1 Unisys Corporation. Proprietary and Confidential.

Slides:



Advertisements
Similar presentations
A Hadoop Overview. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A.
Advertisements

Data-Intensive Computing with MapReduce/Pig Pramod Bhatotia MPI-SWS Distributed Systems – Winter Semester 2014.
Big Data Management and Analytics Introduction Spring 2015 Dr. Latifur Khan 1.
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark 2.
L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.
SQL on Hadoop. Todays agenda Introduction Hive – the first SQL approach Data ingestion and data formats Impala – MPP SQL.
Hadoop Ida Mele. Parallel programming Parallel programming is used to improve performance and efficiency In a parallel program, the processing is broken.
MapReduce.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,
Introduction to MapReduce Amit K Singh. “The density of transistors on a chip doubles every 18 months, for the same cost” (1965) Do you recognize this.
Frankie Pike. 2010: 1.2 zettabytes 1.2 trillion gigabytes DVDs past the moon 2-way = 6 newspapers everyday ~58% growth per year Why care?
Jeffrey D. Ullman Stanford University. 2 Chunking Replication Distribution on Racks.
Map Reduce and Hadoop S. Sudarshan, IIT Bombay
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Whirlwind tour of Hadoop Inspired by Google's GFS Clusters from systems Batch Processing High Throughput Partition-able problems Fault Tolerance.
MapReduce – An overview Medha Atre (May 7, 2008) Dept of Computer Science Rensselaer Polytechnic Institute.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark Cluster Monitoring 2.
Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers.
Apache Hadoop MapReduce What is it ? Why use it ? How does it work Some examples Big users.
Introduction to Hadoop and HDFS
HAMS Technologies 1
Whirlwind Tour of Hadoop Edward Capriolo Rev 2. Whirlwind tour of Hadoop Inspired by Google's GFS Clusters from systems Batch Processing High.
An Introduction to HDInsight June 27 th,
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
Database Applications (15-415) Part II- Hadoop Lecture 26, April 21, 2015 Mohammad Hammoud.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
Hadoop implementation of MapReduce computational model Ján Vaňo.
Map-Reduce Big Data, Map-Reduce, Apache Hadoop SoftUni Team Technical Trainers Software University
Hadoop IT Services Hadoop Users Forum CERN October 7 th,2015 CERN IT-D*
MapReduce and NoSQL CMSC 461 Michael Wilson. Big data  The term big data has become fairly popular as of late  There is a need to store vast quantities.
MapReduce Computer Engineering Department Distributed Systems Course Assoc. Prof. Dr. Ahmet Sayar Kocaeli University - Fall 2015.
Map-Reduce examples 1. So, what is it? A two phase process geared toward optimizing broad, widely distributed parallel computing platforms Apache Hadoop.
Nov 2006 Google released the paper on BigTable.
C-Store: MapReduce Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 22, 2009.
Team3: Xiaokui Shu, Ron Cohen CS5604 at Virginia Tech December 6, 2010.
HADOOP Carson Gallimore, Chris Zingraf, Jonathan Light.
MapReduce & Hadoop IT332 Distributed Systems. Outline  MapReduce  Hadoop  Cloudera Hadoop  Tutorial 2.
Integrating Big Data into the Computing Curricula 02/2015 Achmad Benny Mutiara
Beyond Hadoop The leading open source system for processing big data continues to evolve, but new approaches with added features are on the rise. Ibrahim.
HADOOP Priyanshu Jha A.D.Dilip 6 th IT. Map Reduce patented[1] software framework introduced by Google to support distributed computing on large data.
Microsoft Ignite /28/2017 6:07 PM
Big Data is a Big Deal!.
Big Data Enterprise Patterns
Distributed Programming in “Big Data” Systems Pramod Bhatotia wp
An Open Source Project Commonly Used for Processing Big Data Sets
CS122B: Projects in Databases and Web Applications Winter 2017
Chapter 14 Big Data Analytics and NoSQL
Hadoop Clusters Tess Fulkerson.
Central Florida Business Intelligence User Group
Ministry of Higher Education
Database Applications (15-415) Hadoop Lecture 26, April 19, 2016
MIT 802 Introduction to Data Platforms and Sources Lecture 2
Cse 344 May 2nd – Map/reduce.
Tools for Processing Big Data Jinan Al Aridhee and Christian Bach
CS110: Discussion about Spark
Introduction to Apache
KMeans Clustering on Hadoop Fall 2013 Elke A. Rundensteiner
Overview of big data tools
TIM TAYLOR AND JOSH NEEDHAM
Charles Tappert Seidenberg School of CSIS, Pace University
CS639: Data Management for Data Science
Big DATA.
MapReduce: Simplified Data Processing on Large Clusters
Analysis of Structured or Semi-structured Data on a Hadoop Cluster
Presentation transcript:

© 2012 Unisys Corporation. All rights reserved. 1 Unisys Corporation. Proprietary and Confidential.

© 2012 Unisys Corporation. All rights reserved. 2 Data Technology Landscape Is Rapidly Evolving Relational hegemony is over –Disruptive data technologies abound –Open source, new data models, NoSQL systems –One size no longer fits all Focus expanded from write- to read-intensive applications Old constraints are falling away –Big memory, big storage, big CPU farms, big interconnect –Virtual machines everywhere –New applications with massive data volumes (social networking, BI) –Less restrictive transaction models promote scalability 2 Mike Stonebraker “It’s time for a complete rewrite” UC Berkeley MIT Ingres Postgres Illustra Streambase Vertica VoltDB and more OLTP Analytics 40-odd years OLTP Analytics

© 2012 Unisys Corporation. All rights reserved. 3 Hadoop Mimics Google as Big Data Store 3 Google File System Hadoop Distributed File System Map/Reduce BigTable HBase Megastore Google App Engine Megastore Google App Engine Pig Latin, Hive, Zookeeper, Vendor Analytics Pig Latin, Hive, Zookeeper, Vendor Analytics Apache Software Foundation Distributed File System Table-like Data Model Data Access Technique Applications Your Data Everywhere

© 2012 Unisys Corporation. All rights reserved. 4 Data ‘sharded’ across nodes How HDFS and GFS Work “Shared Nothing” Data Nodes Your Data Everywhere

© 2012 Unisys Corporation. All rights reserved. 5 Map/Reduce Algorithm void map(String name, String document): // name: document name // document: document contents for each word w in document: EmitIntermediate(w, "1"); void reduce(String word, Iterator wordCounts): // word: a word // wordCounts: list of aggregated counts int sum = 0; for each pc in wordCounts: sum += ParseInt(pc); Emit(word, AsString(sum)); A programming pattern –Inspired by functional programming languages –For large scale parallel applications Parallel Algorithm –Map preps input data into pairs, here –Merge (or Combine) phase relevant pairs, arranging them by word –Reduce sums counts for each word, constructs final result Optimized for unstructured data –Minimum metadata stored in dist. file system –Data knowledge resides in map and reduce programs Parts of the algorithm are patented by Google –US Patent #7,650,331 –Filed June 18, 2004, granted January 19, 2010 –Licensed to Hadoop in April, 2010 Standard example is word counting Return Your Data Everywhere

© 2012 Unisys Corporation. All rights reserved. 6 Unisys Corporation. Proprietary and Confidential.

© 2012 Unisys Corporation. All rights reserved. 7 Unisys Corporation. Proprietary and Confidential.