Presentation is loading. Please wait.

Presentation is loading. Please wait.

Guided By Ms. Shikha Pachouly Assistant Professor Computer Engineering Department 2/29/2016.

Similar presentations


Presentation on theme: "Guided By Ms. Shikha Pachouly Assistant Professor Computer Engineering Department 2/29/2016."— Presentation transcript:

1 Guided By Ms. Shikha Pachouly Assistant Professor Computer Engineering Department 2/29/2016

2 Machine Learning Machine learning is programming computers to optimize a performance criterion using example data or past experience. Machine Learning Strategies 1) Supervised 2)Unsupervised 2/29/2016

3 Common Use Cases Recommend friends/dates/products Classify content into predefined groups Find similar content based on object properties Find associations/patterns in action/behaviors Identify key topics in large collection of text Detect anomalies in output Ranking search results 2/29/2016

4 Apache Mahout Introduction Machine Learning Library for Scalable applications Includes core algorithms for Recommendation, Clustering and Classification that are implemented on top of Hadoop Map-Reduce model. Also includes core libraries are highly optimized to allow for good performance also for non-distributed algorithms. 2/29/2016

5

6 Mahout is distributed under a commercially friendly Apache Software license. The goal of Mahout is to build a vibrant, responsive, diverse community to facilitate discussions not only on the project itself but also on potential use cases. Currently Mahout supports mainly three use cases: 1) Recommendation mining 2) Clustering 3) Classification 2/29/2016

7 Why Mahout Many Open Source ML libraries (PyBrain, Shark etc) either 1) lack community 2) lack scalability 3) lack documentations and examples Most Mahout implementations are Map Reduce enabled 2/29/2016

8 The main goal of Apache Mahout is to be useful to practitioners. -This means implementations should be easy to use from within Java applications. -It should be close to trivial to deploy the trained models. -Scaling to include more and more diverse data should be simple. 2/29/2016

9 Recommendations Extensive Framework for collaborative filtering Recommenders 1) user based 2) item based Many different similarity measures e.g. Cosine, LLR, Tanimoto, Pearson, 2/29/2016

10 Algorithms For Recommendatation User-Based Collaborative Filtering – Single Machine Item-Based Collaborative Filtering - single machine / Mapreduce Matrix Factorization with Alternating Least Squares - single machine / MapReduce Matrix Factorization with Alternating Least Squares on Implicit Feedback- single machine / MapReduce Weighted Matrix Factorization, SVD++, Parallel SGD - single machine 2/29/2016

11 User-Based Recommender 2/29/2016

12

13 Clustering 2/29/2016

14 Algorithms for Clustering K-Means Clustering Fuzzy K-Means Mean Shift Clustering Dirichlet Process Clustering (For Topic Modelling) 2/29/2016

15 We can use commands instead of Clustering algorithms that can run on Hadoop infrastructure e.g. for Canopy Clustering command is bin/mahoutorg.apache.mahout.clustering.syntheticcontrol.canopy.Job k-Means Clustering bin/mahoutorg.apache.mahout.clustering.syntheticcontrol.kmeans.Job Fuzzy k-Means Clustering bin/mahoutorg.apache.mahout.clustering.syntheticcontrol.fuzzykmeans.Job 2/29/2016

16 Classification Algorithms implemented in Mahout for Classifiaction Logistic Regression - trained via SGD - single machine Naive Bayes/ Complementary Naive Bayes - MapReduce Random Forest - MapReduce Hidden Markov Models - single machine Multilayer Perceptron - single machine 2/29/2016

17 Running Naïve Bayes from Command Line Three Commands 1) mahout seq2sparse performs TF/IDF transformations 2) mahout trainnb model is trained by using Byes Model 3) mahout testnb classification and testing is performed. 2/29/2016

18 Installation of Mahout Download the tar files of both apache-mahout and apache-maven projects Unzip the tar files in a directory Set the Path Variables for maven Set present working directory to the mahout's core folder Compile the project by 'mvn-compile' Build the project by 'mvn-install' 2/29/2016

19 Mahout Vs Weka Base\ TechnologiesMahoutWEKA ScalabilityMoreLess AlgorithmsLessMore GUINoYes LicenseApacheGPL 2/29/2016

20 MAHOUT COMMERCIAL USERS Adobe: Uses clustering algorithms to increase video consumption by better user targeting. Amazon: For Personalization platform. AOL: For shopping recommendations. Twitter: Uses Mahout’s LDA implementation for user interest modeling. Yahoo! Mail: Uses Mahout’s Frequent Pattern Set Mining. Drupal: Users Mahout to provide open source content recommendation solutions. Evolv: Uses Mahout for its Workforce Predictive Analytics platform. Foursquare: Uses Mahout for its recommendation engine. Idealo: Uses Mahout’s recommendation engine. 2/29/2016

21 References Bingwei Liu, Erik Blasch, Yu Chen, Dan Shen and Genshe Chen, on “Scalable Sentiment Classification for Big Data Analysis Using Naıve Bayes Classifier”, 2013 IEEE International Conference on Big Data. Rui Máximo Esteves, Chunming Rong, “Using Mahout for clustering Wikipedia’s latest Articles”, 2011 Third IEEE International Conference on Cloud Computing Technology and Science. Kathleen Ericson and Shrideep Pallickara, “On the Performance of Distributed Data Clustering Algorithms in File and Streaming Processing Systems”, 2011 Fourth IEEE International Conference on Utility and Cloud Computing. https://mahout.apache.org/ Sean Owen, Robin Anil, “Mahout In Action”, Manning Publications 2/29/2016

22 THANK YOU 2/29/2016


Download ppt "Guided By Ms. Shikha Pachouly Assistant Professor Computer Engineering Department 2/29/2016."

Similar presentations


Ads by Google