Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hadoopla: Microsoft and the Hadoop Ecosystem

Similar presentations


Presentation on theme: "Hadoopla: Microsoft and the Hadoop Ecosystem"— Presentation transcript:

1 Hadoopla: Microsoft and the Hadoop Ecosystem
Presented at SQL Saturday Waltham May 19th, 2012 Jim O’Neil Developer Evangelist, Microsoft

2 Big Data Starts with a V Volume there’s a lot of it; we’re hoarders Variety schema-schmema, it’s coming from the ‘internet of things’ Velocity he who hesitates doesn’t get the worm

3 There’s a Tech for That Volume Data Warehouses Distributed File Systems + Map-Reduce Variety NoSQL databases Velocity Complex Event Processing

4 Two Dimensions of Scale
Up Out

5 Scaling Out is Hard Programming complexity Number of Machines 1 2 3 4
5 6 n Number of Machines

6 Distributed File Systems
name node data node data node data node data node

7 Map Reduce job tracker name node data node data node data node
task tracker

8 Map Reduce I am what I am Word count example I : 1 I : 2 I : 1 am: 1
var reduce = function (key, values, context) { var sum = 0; while (values.hasNext()) { sum += parseInt(values.next()); } context.write(key, sum); }; Word count example Map Reduce I am what I am map I : 1 I : 2 reduce var map = function (key, value, context) { var words = value.split(/[^a-zA-Z]/); for (var i = 0; i < words.length; i++) { if (words[i] !== "") context.write( words[i].toLowerCase(), 1);} } }; I : 1 am: 1 what : 1 am : 1 shuffle and sort var map = function (key, value, context) { var words = value.split(/[^a-zA-Z]/); for (var i = 0; i < words.length; i++) if (words[i] !== "") context.write(words[i].toLowerCase(), 1); } am: 1 what: 1 am: 2 what : 1 reduce

9 Enter Hadoop Apache project (http://hadoop.apache.org)
Open source implementation of Google File System and MapReduce Hadoop Distributed File System (HDFS) Hadoop MapReduce Hadoop Common

10 Hadoop History 2002 Doug Cutting develops Nutch, web crawler
2004 Google publishes MapReduce + GFS paper 2006 Cutting joins Yahoo! Hadoop becomes Apache Lucene subproject Hadoop becomes top-level Apache project Cutting joins Cloudera 2011 Hortonworks formed by Yahoo! and Benchmark Capital 2011 Hadoop reaches version (Dec. 27)

11 Adopters Yahoo! has a 40,000 node cluster
Facebook has over 30PB of data in Hadoop Oracle’s Big Data Appliance includes a Hadoop distribution JP Morgan Chase uses it for fraud detection eBay is replacing its core search technology with it Microsoft is working with Hortonworks to distribute Hadoop on Windows both in the cloud and on-premises

12 http://hadooponazure.com Hadoop on Azure
Limited customer preview Windows Server on-premises distribution to follow

13 Sign up

14 Cluster Provisioning

15 Demo

16 The Menagerie Begins Pig: query infrastructure for Hadoop
SQL-like scripts (Pig Latin) launch map-reduce jobs Hive: data warehouse system for Hadoop HiveQL (SQL-like) for querying (launching map reduce jobs)

17 More Demo

18 More Ecosystem Hbase: NoSQL database built on HDFS
Cassandra: Wide column NoSQL store Sqoop: bridge from RDBMS to HDFS

19 And More Flume: log aggregator to HDFS Scribe: another log aggregator
Chukwa: log processing platform ______ / ___//_ ______ ____ / /_/ / / / / \/ __/ / __/ / /_/ / / / / __/ / / /_/\____/_/_/_/\__/ /_/ Distributed Log Collection.

20 And Some More Zookeeper: distributed system coordinator Oozie: workflow engine Avro: data serialization system Ganglia: distributed monitoring system

21 We’re Not Done Yet! Mahout: machine learning library Pegasus: graph mining system CloudBurst: genome sequence mapping

22 And It’s Just One Piece of the Big Data Pie
Microsoft’s big data solution And It’s Just One Piece of the Big Data Pie FAMILIAR END USER TOOLS Power View Excel with PowerPivot Predictive Analytics Embedded BI BI PLATFORM SSAS SSRS Microsoft SQL Server / PDW Connectors Hadoop On Windows Azure Hadoop On Windows Server UNSTRUCTURED & STRUCTURED DATA Sensors Devices Bots Crawlers ERP CRM LOB

23 I meant what I said, and I said what I meant
I meant what I said, and I said what I meant. An elephant's faithful, one hundred percent. Jim O’Neil Developer Evangelist, Microsoft


Download ppt "Hadoopla: Microsoft and the Hadoop Ecosystem"

Similar presentations


Ads by Google