Presentation is loading. Please wait.

Presentation is loading. Please wait.

Collaborative Filtering - Rajashree. Apache Mahout In 2008 as a subproject of Apache’s Lucene project Mahout absorbed the Taste open source collaborative.

Similar presentations


Presentation on theme: "Collaborative Filtering - Rajashree. Apache Mahout In 2008 as a subproject of Apache’s Lucene project Mahout absorbed the Taste open source collaborative."— Presentation transcript:

1 Collaborative Filtering - Rajashree

2 Apache Mahout In 2008 as a subproject of Apache’s Lucene project Mahout absorbed the Taste open source collaborative filtering project

3 Apache Mahout A Mahout is an elephant trainer/driver/keeper, hence… + Machine Learning =

4 What is Apache Mahout? Machine learning For large data Based on Hadoop But can work on a non Hadoop cluster Scalable Licensed by Apache Mahout is a Java written open source scalable machine learning library from Apache

5 Mahout in Apache Software Foundation Lucene: information retrieval software library

6 Mahout in Apache Software Foundation Hadoop: framework for distributed storage and programming based on MapReduce

7 Mahout in Apache Software Foundation Taste: collaborative filtering framework

8 Why Mahout? Mahout provides a rich set of components from which you can construct a customized recommender system from a selection of algorithms. Mahout is designed for performance, scalability and flexibility.

9 Why do we need a scalable framework?

10 General Architecture Three-tiers architecture (Application, Algorithms and Shared Libraries)

11 General Architecture Data Storage and Shared Libraries

12 General Architecture Business Logic

13 General Architecture External Applications invoking Mahout APIs

14 Use cases Currently Mahout supports mainly four use cases: –Recommendation - takes users' behavior and from that tries to find items users might like. –Clustering - takes e.g. text documents and groups them into groups of topically related documents. –Classification - learns from exisiting categorized documents what documents of a specific category look like and is able to assign unlabelled documents to the (hopefully) correct category. –Frequent itemset mining - takes a set of item groups (terms in a query session, shopping cart content) and identifies, which individual items usually appear together.

15 In this presentation we will focus on Recommendation

16 Mahout implements a Collaborative Filtering framework –Popularized by Amazon and others –Uses hystorical data (ratings, clicks, and purchases) to provide recommendations User-based: Recommend items by finding similar users. This is often harder to scale because of the dynamic nature of users. Item-based: Calculate similarity between items and make recommendations. Items usually don't change much, so this often can be computed offline. Slope-One: A very fast and simple item-based recommendation approach applicable when users have given ratings (and not just boolean preferences).

17 Collaborative Filtering Recommend people and products – User-User User likes X, you might too – Item-Item People who bought X also bought Y

18 Example Amazon.com

19 Recommendation - Architecture Inceptive Idea: A Java/J2EE application invokes a Mahout Recommender whose DataModel is based on a set of User Preferences that are built on the ground of a physical DataStore

20 Recommendation - Architecture Physical Data Data model Recommender External Application

21 Recommendation in Mahout Input: raw data (user preferences) Output: preferences estimation Step 1 –Mapping raw data into a DataModel Mahout- compliant Step 2 –Tuning recommender components Similarity measure, neighborhood, etc.

22 Recommendation Components Mahout implements interfaces to these key abstractions: –DataModel Methods for mapping raw data to a Mahout-compliant form –UserSimilarity Methods to calculate the degree of correlation between two users –ItemSimilarity Methods to calculate the degree of correlation between two items –UserNeighborhood Methods to define the concept of ‘neighborhood’ –Recommender Methods to implement the recommendation step itself

23 Components: DataModel A DataModel is the interface to draw information about user preferences. Which sources is it possible to draw? –Database MySQLJDBCDataModel –External Files FileDataModel –Generic (preferences directly feed through Java code) GenericDataModel

24 Components: DataModel Basic object: Preference –Preference is a triple (user,item,score) Two implementations –GenericUserPreferenceArray It stores numerical preference, as well. –BooleanUserPreferenceArray It skips numerical preference values.

25 Components: UserSimilarity UserSimilarity defines a notion of similarity between two Users. –(respectively) ItemSimilarity defines a notion of similarity between two Items. Which definition of similarity are available? –Pearson Correlation –Spearman Correlation –Euclidean Distance –Tanimoto Coefficient –LogLikelihood Similarity

26 Example: TanimotoDistance

27 Components: UserNeighborhood UserSimilarity defines a notion of similarity between two Users. –(respectively) ItemSimilarity defines a notion of similarity between two Items. Which definition of neighborhood are available? –Nearest N users The first N users with the highest similarity are labeled as ‘neighbors’ –Tresholds Users whose similarity is above a threshold are labeled as ‘neighbors’

28 Components: Recommender Given a DataModel, a definition of similarity between users (items) and a definition of neighborhood, a recommender produces as output an estimation of relevance for each unseen item Which recommendation algorithms are implemented? –User-based CF –Item-based CF –SVD-based CF

29 Download Mahout Download –The latest Mahout release is 0.9 –Available at: http://apache.fastbull.org/mahout/0.9/mahout- distribution-0.9.zip -Extract all the libraries and include them in a new NetBeans (Eclipse) project Requirement: Java 1.6.x or greater. Hadoop is not mandatory!

30 Exercise 1: First Recommender import org.apache.mahout.cf.taste.impl.model.file.*; import org.apache.mahout.cf.taste.impl.neighborhood.*; import org.apache.mahout.cf.taste.impl.recommender.*; import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.model.*; import org.apache.mahout.cf.taste.neighborhood.*; import org.apache.mahout.cf.taste.recommender.*; import org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { DataModel model = new FileDataModel(new File("data/mydata.dat ")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List recommendations = recommender.recommend(1, 1); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); }

31 Exercise 1: First Recommender package dnslab.recommender; import org.apache.mahout.cf.taste.impl.model.file.*; import org.apache.mahout.cf.taste.impl.neighborhood.*; import org.apache.mahout.cf.taste.impl.recommender.*; import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.model.*; import org.apache.mahout.cf.taste.neighborhood.*; import org.apache.mahout.cf.taste.recommender.*; import org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { DataModel model = new FileDataModel(new File("data/mydata.dat ")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List recommendations = recommender.recommend(1, 1); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); } FileData model

32 Exercise 1: First Recommender package dnslab.recommender; import org.apache.mahout.cf.taste.impl.model.file.*; import org.apache.mahout.cf.taste.impl.neighborhood.*; import org.apache.mahout.cf.taste.impl.recommender.*; import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.model.*; import org.apache.mahout.cf.taste.neighborhood.*; import org.apache.mahout.cf.taste.recommender.*; import org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { DataModel model = new FileDataModel(new File("data/mydata.dat ")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List recommendations = recommender.recommend(1, 1); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); } Neighbors

33 Exercise 1: First Recommender package dnslab.recommender; import org.apache.mahout.cf.taste.impl.model.file.*; import org.apache.mahout.cf.taste.impl.neighborhood.*; import org.apache.mahout.cf.taste.impl.recommender.*; import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.model.*; import org.apache.mahout.cf.taste.neighborhood.*; import org.apache.mahout.cf.taste.recommender.*; import org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { DataModel model = new FileDataModel(new File("data/mydata.dat ")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List recommendations = recommender.recommend(1, 1); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); } Top-1 Recommendation for User 1

34 Exercise 1: First Recommender import org.apache.mahout.cf.taste.impl.model.file.*; import org.apache.mahout.cf.taste.impl.neighborhood.*; import org.apache.mahout.cf.taste.impl.recommender.*; import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.model.*; import org.apache.mahout.cf.taste.neighborhood.*; import org.apache.mahout.cf.taste.recommender.*; import org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { DataModel model = new FileDataModel(new File("data/mydata.dat ")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List recommendations = recommender.recommend(1, 1); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); }

35 Output

36 Exercise 2: MovieLens Recommender Download the MovieLens dataset (100k) –http://files.grouplens.org/datasets/movielens/ml- 100k.zip similarity calculations with a bigger dataset Next: now we can run the recommendation framework against a state-of-the-art dataset

37 MovieLens Dataset u.data The full u data set, 100000 ratings by 943 users on 1682 items. Each user has rated at least 20 movies. Users and items are numbered consecutively from 1 The data is randomly ordered. This is a tab separated list of user id | item id | rating | timestamp

38 Data Processing Its format is not Mahout compliant cat u.data | cut –f1,2,3 | tr “\\t” “,” >>u.data1

39 package dnslab.recommender; import java.io.File; import java.io.IOException; import java.util.List; import org.apache.mahout.cf.taste.common.TasteException; import org.apache.mahout.cf.taste.impl.common.LongPrimitiveIterator; import org.apache.mahout.cf.taste.impl.model.file.FileDataModel; import org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender; import org.apache.mahout.cf.taste.impl.similarity.LogLikelihoodSimilarity; import org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity; import org.apache.mahout.cf.taste.model.DataModel; import org.apache.mahout.cf.taste.recommender.RecommendedItem; import org.apache.mahout.cf.taste.similarity.ItemSimilarity; public class ItemRecommend { public static void main(String[] args) { try { DataModel dm = new FileDataModel(new File("data/u.data1")); TanimotoCoefficientSimilarity sim = new TanimotoCoefficientSimilarity(dm); GenericItemBasedRecommender recommender = new GenericItemBasedRecommender(dm, sim); int x=1; for(LongPrimitiveIterator items = dm.getItemIDs(); items.hasNext();) { long itemId = items.nextLong(); List recommendations = recommender.mostSimilarItems(itemId, 5); for(RecommendedItem recommendation : recommendations) { System.out.println(itemId + "," + recommendation.getItemID() + "," + recommendation.getValue()); } x++; if(x>10) System.exit(1); } } catch (IOException e) { System.out.println("There was an error."); e.printStackTrace(); } catch (TasteException e) { System.out.println("There was a Taste Exception"); e.printStackTrace(); }

40 Output Similar items Preferences similarity items

41 Thank you


Download ppt "Collaborative Filtering - Rajashree. Apache Mahout In 2008 as a subproject of Apache’s Lucene project Mahout absorbed the Taste open source collaborative."

Similar presentations


Ads by Google