Presentation is loading. Please wait.

Presentation is loading. Please wait.

Moscow, November 16th, 2011 The Hadoop Ecosystem Kai Voigt, Cloudera Inc.

Similar presentations


Presentation on theme: "Moscow, November 16th, 2011 The Hadoop Ecosystem Kai Voigt, Cloudera Inc."— Presentation transcript:

1 Moscow, November 16th, 2011 The Hadoop Ecosystem Kai Voigt, Cloudera Inc.

2 2 ©2011 Cloudera, Inc. All Rights Reserved.Cloudera 2 HadoopLinux LicenceApacheGPL and others Distribution VendorClouderaRed Hat Free Distribution Cloudera's Distribution Including Hadoop (CDH) Fedora Core Commercial DistributionCloudera Enterprise Red Hat Enterprise Linux (RHEL)

3 3 ©2011 Cloudera, Inc. All Rights Reserved. Hadoop Core 3 HDFS MapReduce

4 4 ©2011 Cloudera, Inc. All Rights Reserved.HDFS 4 Hadoop Distributed File System Redundancy Fault Tolerant Self Healing Write Once, Read Many Times Java API Command Line Tool

5 5 ©2011 Cloudera, Inc. All Rights Reserved.MapReduce 5 Two Phases of Functional Programming Redundancy Fault Tolerant Self Healing Java API

6 6 ©2011 Cloudera, Inc. All Rights Reserved. Hadoop Core 6 HDFS MapReduce

7 7 ©2011 Cloudera, Inc. All Rights Reserved.HDFS-FUSE 7 /mnt/hdfs/ HDFS-FUSE HDFS

8 8 ©2011 Cloudera, Inc. All Rights Reserved. HDFS-FUSE Examples 8 $ mount... fuse on /mnt/hdfs type fuse (rw,nosuid,nodev,user_id=0,group_id=0,default_permissions,allow_other) $ cp /boot/vmlinuz-* /mnt/hdfs/user/cloudera/ $ hadoop fs -ls vmlinuz-*-rw-r--r-- 3 cloudera supergroup 2107004 2011-11- 08 16:14 /user/cloudera/vmlinuz-2.6.18-274.7.1.el5

9 9 ©2011 Cloudera, Inc. All Rights Reserved.Sqoop 9 RDBMS Sqoop HDFS

10 10 ©2011 Cloudera, Inc. All Rights Reserved.Sqoop 10 Import & Export ODBC, JDBC Data Sources CSV Files in HDFS

11 11 ©2011 Cloudera, Inc. All Rights Reserved. Sqoop Examples 11 $ sqoop import --connect jdbc:mysql://localhost/world --username root --table City... $ hadoop fs -cat City/part-m-00000 1,Kabul,AFG,Kabol,17800002,Qandahar,AFG,Qandahar,2375003,Herat,AFG,H erat,1868004,Mazar-e-Sharif,AFG,Balkh,1278005,Amsterdam,NLD,Noord- Holland,731200...

12 12 ©2011 Cloudera, Inc. All Rights Reserved.Hive 12 MapReduce Hive SQL

13 13 ©2011 Cloudera, Inc. All Rights Reserved.Hive 13 Data Warehouse System for Hadoop Data Aggregation Ad-Hoc Queries SQL-like Language (HiveQL) Developed at facebook

14 14 ©2011 Cloudera, Inc. All Rights Reserved. Hive Examples 14 CREATE TABLE newmovie (id INT, name STRING, year INT, numratings INT, avgrating FLOAT);INSERT OVERWRITE TABLE newmovieSELECT id, name, year, COUNT(1), AVG(rating)FROM movie JOIN movieratingON movie.id = movierating.movieidGROUP BY id, name, year;

15 15 ©2011 Cloudera, Inc. All Rights Reserved.Pig 15 MapReduce Pig Script

16 16 ©2011 Cloudera, Inc. All Rights Reserved.Pig 16 Data Warehouse System for Hadoop Data Aggregation Ad-Hoc Queries High-Level Scripting Language (Pig Latin) Developed at Yahoo

17 17 ©2011 Cloudera, Inc. All Rights Reserved. Pig Examples 17 movierating = LOAD 'movierating' AS (userid, movieid, rating:INT);groupmr = GROUP movierating BY movieid;ratings = FOREACH groupmr GENERATE group AS movieid, COUNT(movierating.rating) AS numratings, AVG(movierating.rating) AS avgrating;movie = LOAD 'movie' AS (id, name, year);mr = JOIN movie BY id, ratings BY movieid;result = FOREACH mr GENERATE id, name, year, numratings, avgrating;STORE result INTO 'ratedmovie';

18 18 ©2011 Cloudera, Inc. All Rights Reserved. The Story So Far 18 RDBMS HivePig Sqoop MapReduce HDFS

19 19 ©2011 Cloudera, Inc. All Rights Reserved.HBase 19 Low Latency Random Reads And Writes Distributed Key/Value Store Simple API –PUT –GET –DELETE –SCANE

20 20 ©2011 Cloudera, Inc. All Rights Reserved. HBase Data Model 20 Key RowIDColumnameTimestampValue com.apple.wwwSizeyesterday1234 com.apple.wwwContentyesterday... com.cloudera.wwwSizeyesterday2345 com.cloudera.wwwContentyesterday... com.cloudera.wwwSizetoday3456 com.cloudera.wwwContenttoday... com.facebook.wwwSizeyesterday4567 com.facebook.wwwContentyesterday... com.yahoo.wwwSizetoday5678 com.yahoo.wwwContenttoday...

21 21 ©2011 Cloudera, Inc. All Rights Reserved. HBase Flow 21 GET/PUT/DELETE MEMORY HDFS Logfile

22 22 ©2011 Cloudera, Inc. All Rights Reserved.Flume 22 Many Servers with many Log Files –Webserver –Mailserver –Syslog Store all Logs in One Place –Manageable –Extensible –Reliable

23 23 ©2011 Cloudera, Inc. All Rights Reserved. Flume Architecture 23 Log Flume Node Log Flume Node... HDFS

24 24 ©2011 Cloudera, Inc. All Rights Reserved. Flume Sources and Sinks 24 Local Files HDFS Stdin, Stdout Twitter IRC IMAP

25 25 ©2011 Cloudera, Inc. All Rights Reserved.Whirr 25 Automatic Cluster Setup in the Cloud –Amazon –Rackspace

26 26 ©2011 Cloudera, Inc. All Rights Reserved. Whirr Example 26 $ cat hadoop.properties whirr.cluster-name=myhadoopcluster whirr.instance-templates=1 hadoop-jobtracker+hadoop-namenode,7 hadoop- datanode+hadoop-tasktracker whirr.provider=aws-ec2 whirr.identity=${env:AWS_ACCESS_KEY_ID} whirr.credential=${env:AWS_SECRET_ACCESS_KEY} whirr.private-key-file=${sys:user.home}/.ssh/id_rsa whirr.public-key-file=${sys:user.home}/.ssh/id_rsa.pub $ bin/whirr launch-cluster --config hadoop.properties $. ~/.whirr/myhadoopcluster/hadoop-proxy.sh $ export HADOOP_CONF_DIR=~/.whirr/myhadoopcluster $ bin/whirr destroy-cluster --config hadoop.properties

27 27 ©2011 Cloudera, Inc. All Rights Reserved. Oozie Concept 27 crond for Hadoop Job Flow Control –Branching –Serial –Loops Triggered –Time –Data Job 1 Job 3 Job 2 Job 4Job 5

28 28 ©2011 Cloudera, Inc. All Rights Reserved. Oozie Features 28 Component Independent –MapReduce –Hive –Pig –Sqoop –Streaming

29 29 ©2011 Cloudera, Inc. All Rights Reserved.Mahout Machine Learning Library for Hadoop –Regression –Classification –Recommendations –Pattern Mining 29

30 30 ©2011 Cloudera, Inc. All Rights Reserved. Mahout Use Cases Yahoo: Spam Detection Foursquare: Recommendations SpeedDate.com: Recommendations Adobe: User Targetting Amazon: Personalization Platform 30

31 31 ©2011 Cloudera, Inc. All Rights Reserved.CDH4u2 31 Cloudera's Distribution Including Hadoop http://www.cloudera.com/download/ Linux Packages –Red Hat –Debian –Tar Archive Virtual Machines Cloud Installation with Whirr

32 32 ©2011 Cloudera, Inc. All Rights Reserved. CDH Components 32 HadoopHive PigHBase ZookeeperFlume SqoopWhirr HueOozie FUSE-DFSMahout

33 33 ©2011 Cloudera, Inc. All Rights Reserved. Thank you! Kai Voigt kai@cloudera.com http://www.cloudera.com/ 33


Download ppt "Moscow, November 16th, 2011 The Hadoop Ecosystem Kai Voigt, Cloudera Inc."

Similar presentations


Ads by Google