Presentation is loading. Please wait.

Presentation is loading. Please wait.

Apache Mahout Feb 13, 2012 Shannon Quinn Cloud Computing CS 15-319.

Similar presentations


Presentation on theme: "Apache Mahout Feb 13, 2012 Shannon Quinn Cloud Computing CS 15-319."— Presentation transcript:

1 Apache Mahout Feb 13, 2012 Shannon Quinn Cloud Computing CS 15-319

2 MapReduce Review C0C1C2C3 M0M1M2M3 IO0IO1IO2IO3 R0R1 FO0FO1 chunks mappers Reducers Map Phase Reduce Phase Shuffling Data Scalable programming model Map phase Shuffle Reduce phase MapReduce Implementations Google Hadoop Figure from lecture 6: MapReduce

3 MapReduce Review C0C1C2C3 M0M1M2M3 IO0IO1IO2IO3 R0R1 FO0FO1 chunks mappers Reducers Map Phase Reduce Phase Shuffling Data Scalable programming model Map phase Shuffle Reduce phase MapReduce Implementations Google Hadoop Figure from lecture 6: MapReduce This is our focus!

4 Apache Mahout A scalable machine learning library

5 Apache Mahout A scalable machine learning library Built on Hadoop

6 Apache Mahout A scalable machine learning library Built on Hadoop Philosophy of Mahout (and Hadoop by proxy)

7 What does Mahout do?

8 Recommendation

9 Classification

10 Clustering

11 Other Mahout Algorithms Dimensionality Reduction Regression Evolutionary Algorithms

12 Mahout 1.Recommendation 2.Classification 3.Clustering

13 Recommendation Overview Help users find items they might like based on historical preferences

14 Recommendation Overview Mathematically

15 Recommendation Overview Alice Bob Peter 514 ? 25 432*based on example by Sebastian Schelter

16 Recommendation Overview 514 - 25 432*based on example by Sebastian Schelter

17 Recommendation Overview 514 ? 25 432 Bob *based on example by Sebastian Schelter

18 Recommendation Overview 514 1.5 25 432 Bob *based on example by Sebastian Schelter

19 Recommendation in Mahout 1 st Map phase: process input *based on example by Sebastian Schelter

20 Recommendation in Mahout 1 st Map phase: process input 1 st Reduce phase: list by user *based on example by Sebastian Schelter

21 Recommendation in Mahout 2 nd Map phase: Emit co-occurred ratings *based on example by Sebastian Schelter

22 Recommendation in Mahout 2 nd Map phase: Emit co-occurred ratings 2 nd Reduce phase: Compute similarities *based on example by Sebastian Schelter

23 Mahout 1.Recommendation 2.Classification 3.Clustering

24 Classification Overview Assigning data to discrete categories

25 Classification Overview Assigning data to discrete categories Train a model on labeled data Spam Not spam

26 Classification Overview Assigning data to discrete categories Train a model on labeled data Run the model on new, unlabeled data Spam Not spam ?

27 Naïve Bayes Example

28 Prob (token | label) =

29 Naïve Bayes Example Not spam

30 Naïve Bayes Example President Obama’s Nobel Prize Speech Not spam

31 Naïve Bayes Example Spam

32 Naïve Bayes Example Spam Spam email content

33 Naïve Bayes Example

34 “Order a trial Adobe chicken daily EAB-List new summer savings, welcome!”

35 Naïve Bayes in Mahout Complex!

36 Naïve Bayes in Mahout Complex! Training 1.Read the features

37 Naïve Bayes in Mahout Complex! Training 1.Read the features 2.Calculate per-document statistics

38 Naïve Bayes in Mahout Complex! Training 1.Read the features 2.Calculate per-document statistics 3.Normalize across categories

39 Naïve Bayes in Mahout Complex! Training 1.Read the features 2.Calculate per-document statistics 3.Normalize across categories 4.Calculate normalizing factor of each label

40 Naïve Bayes in Mahout Complex! Training 1.Read the features 2.Calculate per-document statistics 3.Normalize across categories 4.Calculate normalizing factor of each label Testing Classification

41 Other Classification Algorithms Stochastic Gradient Descent

42 Other Classification Algorithms Stochastic Gradient Descent Support Vector Machines

43 Other Classification Algorithms Stochastic Gradient Descent Support Vector Machines Random Forests

44 Mahout 1.Recommendation 2.Classification 3.Clustering

45 Clustering Overview Grouping unstructured data

46 Clustering Overview Grouping unstructured data Small intra-cluster distance

47 Clustering Overview Grouping unstructured data Small intra-cluster distance Large inter-cluster distance

48 K-Means Clustering Example

49

50

51

52

53

54

55

56

57 Cats Dogs

58 K-Means Clustering in Mahout + C0C1C2C3 M0M1M2M3 IO0IO1IO2IO3 R0R1 FO0FO1 chunks mappers Reducers Map Phase Reduce Phase Shuffling Data Figure from lecture 6: MapReduce

59 K-Means Clustering in Mahout Assume: # clusters <<< # points

60 K-Means Clustering in Mahout Assume: # clusters <<< # points M0M1M2M3

61 K-Means Clustering in Mahout Assume: # clusters <<< # points M0M1M2M3 R0R1

62 K-Means Clustering in Mahout Map phase: assign cluster IDs

63 K-Means Clustering in Mahout Map phase: assign cluster IDs Reduce phase: reset centroids

64 K-Means Clustering in Mahout Important notes --maxIter --convergenceDelta method

65 Other Clustering Algorithms Latent Dirichlet Allocation Topic models

66 Other Clustering Algorithms Latent Dirichlet Allocation Topic models Fuzzy K-Means Points are assigned multiple clusters

67 Other Clustering Algorithms Latent Dirichlet Allocation Topic models Fuzzy K-Means Points are assigned multiple clusters Canopy clustering Fast approximations of clusters

68 Other Clustering Algorithms Latent Dirichlet Allocation Topic models Fuzzy K-Means Points are assigned multiple clusters Canopy clustering Fast approximations of clusters Spectral clustering Treat points as a graph

69 Other Clustering Algorithms Latent Dirichlet Allocation Topic models Fuzzy K-Means Points are assigned multiple clusters Canopy clustering Fast approximations of clusters Spectral clustering Treat points as a graph K-Means & Eigencuts

70 Mahout in Summary

71 Scalable library

72 Mahout in Summary Scalable library Three primary areas of focus

73 Mahout in Summary Scalable library Three primary areas of focus Other algorithms

74 Mahout in Summary Scalable library Three primary areas of focus Other algorithms All in your friendly neighborhood MapReduce

75 Mahout in Summary Scalable library Three primary areas of focus Other algorithms All in your friendly neighborhood MapReduce http://mahout.apache.org/

76 Thank you!


Download ppt "Apache Mahout Feb 13, 2012 Shannon Quinn Cloud Computing CS 15-319."

Similar presentations


Ads by Google