Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introducing Apache Mahout Scalable Machine Learning for All! Grant Ingersoll Lucid Imagination.

Similar presentations


Presentation on theme: "Introducing Apache Mahout Scalable Machine Learning for All! Grant Ingersoll Lucid Imagination."— Presentation transcript:

1 Introducing Apache Mahout Scalable Machine Learning for All! Grant Ingersoll Lucid Imagination

2 Overview What is Machine Learning? Mahout

3 Definition “Machine Learning is programming computers to optimize a performance criterion using example data or past experience” –Intro. To Machine Learning by E. Alpaydin Subset of Artificial Intelligence –Many other fields: comp sci., biology, math, psychology, etc.

4 Types Supervised –Using labeled training data, create function that predicts output of unseen inputs Unsupervised –Using unlabeled data, create function that predicts output Semi-Supervised –Uses labeled and unlabeled data

5 Characterizations Lots of Data Identifiable Features in that Data Too big/costly for people to handle –People still can help

6 Clustering Unsupervised Find Natural Groupings –Documents –Search Results –People –Genetic traits in groups –Many, many more uses

7 Example: Clustering Google News

8 Collaborative Filtering Unsupervised Recommend people and products –User-User User likes X, you might too –Item-Item People who bought X also bought Y

9 Example: Collab Filtering Amazon.com

10 Classification/Categorization Many, many types Spam Filtering Named Entity Recognition Phrase Identification Sentiment Analysis Classification into a Taxonomy

11 Example: NER NER? Excerpt from Yahoo News

12 Example: Categorization

13 Info. Retrieval Learning Ranking Functions Learning Spelling Corrections User Click Analysis and Tracking

14 Other Image Analysis Robotics Games Higher level natural language processing Many, many others

15 What is Apache Mahout? A Mahout is an elephant trainer/driver/keeper, hence… + Machine Learning = (and other distributed techniques)

16 What? Hadoop brings: –Map/Reduce API –HDFS –In other words, scalability and fault- tolerance Mahout brings: –Library of machine learning algorithms –Examples

17 Why Mahout? Many Open Source ML libraries either: –Lack Community –Lack Documentation and Examples –Lack Scalability –Lack the Apache License ;-) –Or are research-oriented

18 Why Mahout? Intelligent Apps are the Present and Future Thus, Mahout’s Goal is: –Scalable Machine Learning with Apache License

19 Current Status What’s in it: –Simple Matrix/Vector library –Taste Collaborative Filtering –Clustering Canopy/K-Means/Fuzzy K-Means/Mean-shift/Dirichlet –Classifiers Naïve Bayes Complementary NB –Evolutionary Integration with Watchmaker for fitness function

20 How? Examples –Taste –Clustering –Classification –Evolutionary

21 Taste: Movie Recommendations Given ratings by users of movies, recommend other movies http://lucene.apache.org/mahout/taste.html#demohttp://lucene.apache.org/mahout/taste.html#demo

22 Taste Demo http://localhost:8080/mahout-taste- webapp/RecommenderServlet?userI D=12&debug=truehttp://localhost:8080/mahout-taste- webapp/RecommenderServlet?userI D=12&debug=true http://localhost:8080/mahout-taste- webapp/RecommenderServlet?userI D=43&debug=true

23 Clustering: Synthetic Control Data http://archive.ics.uci.edu/ml/datasets/Synth etic+Control+Chart+Time+Serieshttp://archive.ics.uci.edu/ml/datasets/Synth etic+Control+Chart+Time+Series Each clustering impl. has an example Job for running in /examples –o.a.mahout.clustering.syntheticcontrol.* Outputs clusters…

24 Classification: NB and CNB Examples 20 Newsgroups –http://cwiki.apache.org/confluence/displa y/MAHOUT/TwentyNewsgroupshttp://cwiki.apache.org/confluence/displa y/MAHOUT/TwentyNewsgroups Wikipedia –http://cwiki.apache.org/confluence/displa y/MAHOUT/WikipediaBayesExamplehttp://cwiki.apache.org/confluence/displa y/MAHOUT/WikipediaBayesExample

25 Evolutionary Traveling Salesman –http://cwiki.apache.org/confluence/displa y/MAHOUT/Traveling+Salesman Class Discovery –http://cwiki.apache.org/confluence/displa y/MAHOUT/Class+Discovery

26 What’s Next? More Examples Winnow/Perceptron (MAHOUT-85) Text Clustering Association Rules (MAHOUT-108) Logistic Regression Solr Integration (SOLR-769) GSOC

27 When, Who When? Now! –Mahout is growing Who? You! –We want programmers who: Are comfortable with math Like to work on hard problems –We want others to: Kick the tires

28 Where? http://lucene.apache.org/mahout –Hadoop - http://hadoop.apache.org http://cwiki.apache.org/MAHOUT mahout-{user|dev}@lucene.apache.org –http://www.lucidimagination.com/search/p:mahout

29 Resources “Programming Collective Intelligence” by Segaran “Data Mining - Practical Machine Learning Tools and Techniques” by Witten and Frank “Taming Text” by Ingersoll and Morton


Download ppt "Introducing Apache Mahout Scalable Machine Learning for All! Grant Ingersoll Lucid Imagination."

Similar presentations


Ads by Google