Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dan Bassett, Jonathan Canfield December 13, 2011.

Similar presentations


Presentation on theme: "Dan Bassett, Jonathan Canfield December 13, 2011."— Presentation transcript:

1 Dan Bassett, Jonathan Canfield December 13, 2011

2 What is Hadoop? Allows for the distributed processing of large data sets across clusters of computers Open-source project written in Java Actively supported Inspired by a project that Google started 2

3 What’s the big deal? Changes the economics and dynamics of large scale computing Scalable Cost effective Flexible Fault Tolerant 3

4 Commercially supported InfoSphere BigInsights Silicon Graphics CloudRack EMC Greenplum Google App Engine Oracle Big Data Appliance Cloudera CDH, Professional Services Microsoft Windows Server, SQL Server 4

5 Who Uses Hadoop? 5

6 Prominent Users Facebook - claims to have the largest Hadoop cluster in the world at 30PB. Yahoo! - claims to have the world’s largest Hadoop production application. eBay – 5.3PB, 532 nodes cluster New York Times – processed 4TB of image data into 11 million PDFs at cost of ~ $240 6

7 H OW D OES I T W ORK ? 7

8 Architecture Hadoop Common Hadoop Distributed File System (HDFS) MapReduce Engine 8

9 File System (HDFS) One big file system from many nodes Fault-tolerant Runs on low-cost commodity hardware 9

10 MapReduce Engine Splits input data Assigns work to nodes Processed in parallel 10

11 MapReduce Illustration 11

12 MapReduce Step 1 12

13 MapReduce Step 2 13

14 MapReduce Step 3 14

15 MapReduce Step 4 15

16 MapReduce Step 4 16

17 MapReduce Step 5 17

18 MapReduce Step 5 18

19 MapReduce Step 6 19

20 MapReduce Illustration 20

21 Resources Project Home http://hadoop.apache.org/ Wikipedia http://en.wikipedia.org/wiki/Apache_Hadoop IBM http://www-01.ibm.com/software/data/infosphere/hadoop/ 21


Download ppt "Dan Bassett, Jonathan Canfield December 13, 2011."

Similar presentations


Ads by Google