Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introducing Apache Mahout

Similar presentations


Presentation on theme: "Introducing Apache Mahout"— Presentation transcript:

1 Introducing Apache Mahout
Scalable Machine Learning for All! Grant Ingersoll

2 Agenda What is Machine Learning? Mahout Definitions Types Applications
Why? How? Who?

3 NOT! What is Machine Learning? Or?

4 How about? Google News

5 Or? Amazon.com

6 Definition “Machine Learning is programming computers to optimize a performance criterion using example data or past experience” Intro. To Machine Learning by E. Alpaydin Subset of Artificial Intelligence Many other fields: comp sci., biology, math, psychology, etc.

7 Characterizations Lots of Data Identifiable Features in that Data
Too big/costly for people to handle People still can help

8 Types Supervised Unsupervised Semi-Supervised
Using labeled training data, create function that predicts output of unseen inputs Unsupervised Using unlabeled data, create function that predicts output Semi-Supervised Uses labeled and unlabeled data

9 Classification/Categorization
Spam Filtering Named Entity Recognition Phrase Identification Sentiment Analysis Classification into a Taxonomy

10 Clustering Find Natural Groupings Documents Search Results People
Genetic traits in groups Many, many more uses

11 Collaborative Filtering
Recommend people and products User-User User likes X, you might too Item-Item People who bought X also bought Y

12 Info. Retrieval Learning Ranking Functions
Learning Spelling Corrections User Click Analysis and Tracking

13 Other Image Analysis Robotics Games
Higher level natural language processing Many, many others

14 What is Apache Mahout? A Mahout is an elephant trainer/driver/keeper, hence… + Machine Learning = (and other distributed techniques)

15 What? Hadoop brings: Thus, Mahout’s Goal is: Map/Reduce API HDFS
In other words, scalability and fault-tolerance Thus, Mahout’s Goal is: Scalable Machine Learning with Apache License

16 Why Mahout? Many Open Source ML libraries either:
Lack Community Lack Documentation and Examples Lack Scalability Lack the Apache License ;-) Or are research-oriented Personal: Learn more ML Intelligent Apps are the Present and Future See the Hadoop talks tomorrow and Friday! Goal: Overcome gaps the Apache Way!

17 Current Status Close to Initial release What’s in it:
Focused on examples, docs, bug fixes What’s in it: Simple Matrix/Vector library Taste Collaborative Filtering Clustering Canopy/K-Means/Fuzzy K-Means/Mean-shift Classifiers Naïve Bayes Complementary NB Evolutionary Integration with Watchmaker for fitness function

18 How? Examples Taste Clustering Classification Evolutionary

19 Taste: Movie Recommendations
Given ratings by users of movies, recommend other movies

20 Clustering: Synthetic Control Data
Each clustering impl. has an example Job for running in <MAHOUT_HOME>/examples o.a.mahout.clustering.syntheticcontrol.* Outputs clusters… See output.txt, synthetic_control data

21 Classification: NB and CNB Examples
20 Newsgroups Wikipedia

22 Evolutionary Traveling Salesman Class Discovery
Class Discovery

23 What’s Next? Release 0.1! Shared Amazon Images (others?) More Examples
Winnow/Perceptron (MAHOUT-85) Hbase and HAMA support Normalize I/O format for data Solr Integration (SOLR-769) Other Algorithms: SVM, Linear Regression, etc.

24 When, Where, Who When? Now! Who? You! Where? Mahout is growing
We want Java programmers who: Are comfortable with math Like to work on large, hard problems Where?

25 Resources “Programming Collective Intelligence” by Toby Segaran
“Data Mining - Practical Machine Learning Tools and Techniques” by Ian H. Witten and Eibe Frank Hadoop -


Download ppt "Introducing Apache Mahout"

Similar presentations


Ads by Google