Download presentation
Presentation is loading. Please wait.
Published byHerbert Snow Modified over 9 years ago
1
Apache Mahout Feb 13, 2012 Shannon Quinn Cloud Computing CS 15-319
2
MapReduce Review C0C1C2C3 M0M1M2M3 IO0IO1IO2IO3 R0R1 FO0FO1 chunks mappers Reducers Map Phase Reduce Phase Shuffling Data Scalable programming model Map phase Shuffle Reduce phase MapReduce Implementations Google Hadoop Figure from lecture 6: MapReduce
3
MapReduce Review C0C1C2C3 M0M1M2M3 IO0IO1IO2IO3 R0R1 FO0FO1 chunks mappers Reducers Map Phase Reduce Phase Shuffling Data Scalable programming model Map phase Shuffle Reduce phase MapReduce Implementations Google Hadoop Figure from lecture 6: MapReduce This is our focus!
4
Apache Mahout A scalable machine learning library
5
Apache Mahout A scalable machine learning library Built on Hadoop
6
Apache Mahout A scalable machine learning library Built on Hadoop Philosophy of Mahout (and Hadoop by proxy)
7
What does Mahout do?
8
Recommendation
9
Classification
10
Clustering
11
Other Mahout Algorithms Dimensionality Reduction Regression Evolutionary Algorithms
12
Mahout 1.Recommendation 2.Classification 3.Clustering
13
Recommendation Overview Help users find items they might like based on historical preferences
14
Recommendation Overview Mathematically
15
Recommendation Overview Alice Bob Peter 514 ? 25 432*based on example by Sebastian Schelter
16
Recommendation Overview 514 - 25 432*based on example by Sebastian Schelter
17
Recommendation Overview 514 ? 25 432 Bob *based on example by Sebastian Schelter
18
Recommendation Overview 514 1.5 25 432 Bob *based on example by Sebastian Schelter
19
Recommendation in Mahout 1 st Map phase: process input *based on example by Sebastian Schelter
20
Recommendation in Mahout 1 st Map phase: process input 1 st Reduce phase: list by user *based on example by Sebastian Schelter
21
Recommendation in Mahout 2 nd Map phase: Emit co-occurred ratings *based on example by Sebastian Schelter
22
Recommendation in Mahout 2 nd Map phase: Emit co-occurred ratings 2 nd Reduce phase: Compute similarities *based on example by Sebastian Schelter
23
Mahout 1.Recommendation 2.Classification 3.Clustering
24
Classification Overview Assigning data to discrete categories
25
Classification Overview Assigning data to discrete categories Train a model on labeled data Spam Not spam
26
Classification Overview Assigning data to discrete categories Train a model on labeled data Run the model on new, unlabeled data Spam Not spam ?
27
Naïve Bayes Example
28
Prob (token | label) =
29
Naïve Bayes Example Not spam
30
Naïve Bayes Example President Obama’s Nobel Prize Speech Not spam
31
Naïve Bayes Example Spam
32
Naïve Bayes Example Spam Spam email content
33
Naïve Bayes Example
34
“Order a trial Adobe chicken daily EAB-List new summer savings, welcome!”
35
Naïve Bayes in Mahout Complex!
36
Naïve Bayes in Mahout Complex! Training 1.Read the features
37
Naïve Bayes in Mahout Complex! Training 1.Read the features 2.Calculate per-document statistics
38
Naïve Bayes in Mahout Complex! Training 1.Read the features 2.Calculate per-document statistics 3.Normalize across categories
39
Naïve Bayes in Mahout Complex! Training 1.Read the features 2.Calculate per-document statistics 3.Normalize across categories 4.Calculate normalizing factor of each label
40
Naïve Bayes in Mahout Complex! Training 1.Read the features 2.Calculate per-document statistics 3.Normalize across categories 4.Calculate normalizing factor of each label Testing Classification
41
Other Classification Algorithms Stochastic Gradient Descent
42
Other Classification Algorithms Stochastic Gradient Descent Support Vector Machines
43
Other Classification Algorithms Stochastic Gradient Descent Support Vector Machines Random Forests
44
Mahout 1.Recommendation 2.Classification 3.Clustering
45
Clustering Overview Grouping unstructured data
46
Clustering Overview Grouping unstructured data Small intra-cluster distance
47
Clustering Overview Grouping unstructured data Small intra-cluster distance Large inter-cluster distance
48
K-Means Clustering Example
57
Cats Dogs
58
K-Means Clustering in Mahout + C0C1C2C3 M0M1M2M3 IO0IO1IO2IO3 R0R1 FO0FO1 chunks mappers Reducers Map Phase Reduce Phase Shuffling Data Figure from lecture 6: MapReduce
59
K-Means Clustering in Mahout Assume: # clusters <<< # points
60
K-Means Clustering in Mahout Assume: # clusters <<< # points M0M1M2M3
61
K-Means Clustering in Mahout Assume: # clusters <<< # points M0M1M2M3 R0R1
62
K-Means Clustering in Mahout Map phase: assign cluster IDs
63
K-Means Clustering in Mahout Map phase: assign cluster IDs Reduce phase: reset centroids
64
K-Means Clustering in Mahout Important notes --maxIter --convergenceDelta method
65
Other Clustering Algorithms Latent Dirichlet Allocation Topic models
66
Other Clustering Algorithms Latent Dirichlet Allocation Topic models Fuzzy K-Means Points are assigned multiple clusters
67
Other Clustering Algorithms Latent Dirichlet Allocation Topic models Fuzzy K-Means Points are assigned multiple clusters Canopy clustering Fast approximations of clusters
68
Other Clustering Algorithms Latent Dirichlet Allocation Topic models Fuzzy K-Means Points are assigned multiple clusters Canopy clustering Fast approximations of clusters Spectral clustering Treat points as a graph
69
Other Clustering Algorithms Latent Dirichlet Allocation Topic models Fuzzy K-Means Points are assigned multiple clusters Canopy clustering Fast approximations of clusters Spectral clustering Treat points as a graph K-Means & Eigencuts
70
Mahout in Summary
71
Scalable library
72
Mahout in Summary Scalable library Three primary areas of focus
73
Mahout in Summary Scalable library Three primary areas of focus Other algorithms
74
Mahout in Summary Scalable library Three primary areas of focus Other algorithms All in your friendly neighborhood MapReduce
75
Mahout in Summary Scalable library Three primary areas of focus Other algorithms All in your friendly neighborhood MapReduce http://mahout.apache.org/
76
Thank you!
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.