Presentation is loading. Please wait.

Presentation is loading. Please wait.

Collaborative Filtering

Similar presentations


Presentation on theme: "Collaborative Filtering"— Presentation transcript:

1 Collaborative Filtering
Zeinab Noorian, PhD

2 Agenda Collaborative Filtering (CF) Pure CF approaches
User-based nearest-neighbor The Pearson Correlation similarity measure Memory-based and model-based approaches Item-based nearest-neighbor The cosine similarity measure Data sparsity problems Recent methods (Association Rule Mining, probabilistic methods, Slope One, …) The Google News personalization engine Literature

3 Collaborative Filtering (CF)
The most prominent approach to generate recommendations used by large, commercial e-commerce sites various algorithms and variations exist applicable in many domains (book, movies, DVDs, ..) Approach use the "wisdom of the crowd" to recommend items Basic assumption and idea Users give ratings to catalog items (implicitly or explicitly) Customers who had similar tastes in the past, will have similar tastes in the future The main idea of the collaborative filtering is to use the opinion of other user community to predict what item or product the current user will be like so or might be interested in.

4 Pure CF Approaches Input Output types
Only a matrix of given user–item ratings Output types A (numerical) prediction indicating to what degree the current user will like or dislike a certain item A top-N list of recommended items

5 User-based nearest-neighbor collaborative filtering (1)
The basic technique Given an "active user" (Alice) and an item 𝑖 not yet seen by Alice find a set of users (peers/nearest neighbors) who liked the same items as Alice in the past and who have rated item 𝑖 use, e.g. the average of their ratings to predict, if Alice will like item 𝑖 do this for all items Alice has not seen and recommend the best-rated Basic assumption and idea If users had similar tastes in the past they will have similar tastes in the future User preferences remain stable and consistent over time

6 User-based nearest-neighbor collaborative filtering (2)
Example A database of ratings of the current user, Alice, and some other users is given: The task of the Recommender system is to determine whether Alice will like or dislike Item5, which Alice has not yet rated or seen Item1 Item2 Item3 Item4 Item5 Alice 5 3 4 ? User1 1 2 User2 User3 User4

7 User-based nearest-neighbor collaborative filtering (3)
Some first questions How do we measure similarity between other users and Alice? How many neighbors should we consider? How do we generate a prediction from the neighbors' ratings? Item1 Item2 Item3 Item4 Item5 Alice 5 3 4 ? User1 1 2 User2 User3 User4

8 Measuring user similarity (1)
A popular similarity measure in user-based CF: Pearson’s correlation coefficient. a, b : users P={p1,…,pm}: sets of products/items ra,p : the rating of user a for product p ŕa : the average rating of user a Pearson Correlation takes a value between [-1, +1].

9 Measuring user similarity (2)
If (r’Alice = r’a = 4 ) and (r’user1 = r’b = 2.4 ) the similarity of Alice with User1 is calculated as: Item1 Item2 Item3 Item4 Item5 Alice 5 3 4 ? User1 1 2 User2 User3 User4 sim = -0,79 sim = 0,70 sim = 0,85 sim = 0,00

10 Pearson Correlation Highlights
Takes differences in rating behavior into account Some users are optimistic so they give high rating to every item. So their average of the given ratings are high. Some users are pessimistic, they hardly give high rating to any item. Their average of the given ratings are low. Pearson correlation consider different rating behaviors by subtracting the average rating of users with their given rating for the item (r`a –r_a) So even though the rating of User 1 and Alice are quite different, a rather clear linear correlation of their ratings is detected. On the contrast, see the plot for User4 which have opposite correlation with Alice.

11 Making predictions A common prediction function:
Calculate, whether the neighbors' ratings for the unseen item are higher or lower than their average Combine the rating differences – use the similarity metric as a weight Add/subtract the neighbors' bias from the active user's average and use this as a prediction we consider N as the number of neighbors with acceptable level of similarity which we consider their ratings in the prediction for Alice for item 5. Prediction function is a weighted average function based on the similarity of different users with a. To also deal with rating bias and different rating behavior of users we add the average rating of user a and subtract the average rating of other similar users e.g. b when formalizing prediction function. In this example, we only consider rating from user 1 and user 2 as the near neighbors of Alice and predict the Alice’s rating for item5. A predicted rating for item 5 for Alice is 4.87.

12 Improving the metrics / prediction function
Not all ratings of items should be equally ”important" Agreement of two users on commonly liked items is not as informative/valuable as the agreement of two users on the controversial items. Possible solution: Give more weight to items that have a higher variance in ratings Value of number of co-rated items—the case where two users have only few co-rated items in common. Use "significance weighting", by e.g., linearly reducing the similarity weight of users when the number of co-rated items is low–(Herlook et. Al (1999)). Case amplification Intuition: Give more weight to "very similar" neighbors (amplifies their similarity by the constant p =2.5, used in Breese et. Al, 1998), i.e., where the similarity value is close to 1. Neighborhood selection Use similarity threshold or fixed number of neighbors/ top-N neighbors. 1.Pearson correlation does not differentiate the agreement by two users on commonly liked product and their agreement on the controversial items. 2. Pearson correlation does not take into account whether two users have rated only few items OR co-rated large number of items.

13 Limitations of User-based Collaborative Filtering
In case of dealing with large rating database, the computational complexity will be high. The rating matrix is typically very sparse such that users only rated a few subsets of available items. How to deal with new users WITHOUT any previous ratings OR how to deal with new items with NO ratings have not been addressed in User-based CF.

14 Memory-based and model-based approaches
User-based CF is said to be "memory-based" the rating matrix is kept in memory is directly used to find neighbors / make predictions does not scale for most real-world scenarios large e-commerce sites have tens of millions of customers and millions of items Model-based approaches based on an offline pre-processing or "model-learning" phase at run-time, only the learned model is used to make predictions models are updated / re-trained periodically large variety of techniques used model-building and updating can be computationally expensive item-based CF is an example for model-based approaches CF techniques are classified as being either memory-based or model-based model Memory-based approaches are computationally expensive since it should evaluate the similarity of users every time. So, it does not scale well. In a model-based approaches, the item rating similarity are processed offline so the learned model will be instantly to make predictions.

15 Item-based collaborative filtering
Basic idea: Use the similarity between items (and not users) to make predictions Example: Look for items that are similar to Item5 Take Alice's ratings for these items to predict the rating for Item5 Item1 Item2 Item3 Item4 Item5 Alice 5 3 4 ? User1 1 2 User2 User3 User4

16 The cosine similarity measure (1)
Items ratings are represented as a u-dimensional vectors over user space Similarity is calculated based on the angle between two vectors a,b: the items’ rating vector . Is a dot product of ratings vector. |ā| is a Euclidian length of the vector, which is defined as the square root of the dot product of the vector with itself. Produce a value between [0,1]. Cosine similarity measure produces better results in item-to-item filtering

17 The cosine similarity measure (2) – Example 1
The cosine similarity of item5 and item1 is therefore calculated as: Item1 Item2 Item3 Item4 Item5 Alice 5 3 4 ? User1 1 2 User2 User3 User4

18 Cosine similarity measures—Example 2.
User 1 User2 Item1 2 5 Item2 4 Item 3 3 User 2 User 1

19 Adjusted cosine similarity (1)
First: We transform the original rating database and replace the original rating values with their deviation from the average ratings. Second: we apply the original Cosine similarity measure to compute the similarity between items (i.e. item1 and item5). The basic cosine similarity measure does not take the differences in the average rating behavior of the users into account. So the adjusted cosine similarity has been proposed which consider users’ average rating in the formula. i.e., it subtract the user average from the rating. We calculate the average ratings of the user. Then we deviate users original ratings from the average (rating_ original – rating_average)

20 Adjusted cosine similarity (2)– The formalization
U : set of users who have rated both items a and b. ŕu : the average rating of user u ŕu,b : the average rating of user u for item b

21 Making predictions in item-based CF
A common prediction function: ratedItems(u): rated items similar to the target item p rated by user u. Homework: Predict a rating of Alice for Item5 yourself! After similarity between items are determined we can predict a rating of Alice for item5 by calculating a weighted sum of Alice’s rating for the items that are similar to item5.

22 Selecting Neighbors in Collaborative Filtering Algorithms
In theory, the more the better If you have a good similarity metric In practice, noise from dissimilar neighbors decreases usefulness Not all neighbors are taken into account for the prediction Should filter out neighbors with low similarity/ low credibility Should come up with adaptive credibility/similarity threshold An analysis of the MovieLens dataset indicates that "in most real-world situations, a neighborhood of 20 to 50 neighbors seems reasonable" (Herlocker et al. 2002) Fewer neighbors  different opinions of newly-added neighbors considerably change the similarity degree, (low coverage). Still remained as open problem!

23 Pre-processing for item-based filtering
Item-based filtering does scale better than User-based filtering! Pre-processing approach by Amazon.com (in 2003) Construct an item similarity matrix Calculate all pair-wise item similarities offline In the runtime, to make a prediction of Item p for user u, AMAZON retrieve items similar to p, which have been rated by u. The number of items to be used at runtime is typically rather small, because only items are taken into account which the active user has rated. Pre-processing for user-based CF is also possible, however: The number of overlapping ratings for two users are small. Item similarities are more stable than user similarities. User profiles/preferences change more frequently than item profiles/features Item-to-Item CF techniques used by Amazon to recommend books or CD to their customers, implemented in where amazon had 29 million users and million of catalogs. The traditional user-based CF cannot scale well in a large e-commerce website imagine we have M= million users and N: million catalogs. In the worse case we have to calculate the similarity of M users across N catalogs. Although this would not happen in a real world, still the calculation of the prediction is infeasible when dealing with million of users given the fact that we want to respond to user in a short time. So, the user-based filtering would not scale well.

24 Pre-processing for item-based filtering
WRT memory requirements for item similarity matrix: Item similarity matrix for N items can be up to N2 entries. In real world, it is actually much smaller! Techniques to reduce complexity Consider items that have minimum number of co-ratings Memorize only a limited neighborhood for each item.

25 Question??? Think about a better approach to improve collaborative filtering! How about changing the similarity measure in item-item CF? But How we can do that??? Discuss it!

26 Data sparsity problems
Cold start problem How to recommend new items? What to recommend to new users? Straightforward approaches Use another method (e.g., content-based, demographic or simply non-personalized) in the initial phase Ask/force users to rate a set of (controversial NOT popular) items Default voting: assign default values to items that only one of the two users to be compared has rated (Breese et al. 1998) Alternatives Use better algorithms (beyond nearest-neighbor approaches) Example: In nearest-neighbor approaches, the set of sufficiently similar neighbors might be too small to make good predictions Assume "transitivity" of neighborhoods

27 Example algorithms for sparse datasets
Recursive Collaborative Filtering: Assume that, Alice has a very close neighbor, User1, with similarity degree of 0.85 who however has not rated the target item, item5. Idea Use CF (recursively) to find a close neighbor for User1 (e.g. User3) who has rated a target item (i.e. item5), and predict the rating for User1. Use the predicted rating to recommend item to Alice instead of personal rating from more distance neighbor. Item1 Item2 Item3 Item4 Item5 Alice 5 3 4 ? User1 1 2 User2 User3 User4 sim = 0.85 The idea is that, even though that Alice’s close neighbor (Bob) does not have personal rating for a target item, however, the predicted rating for him/ from his CLOSE friends is more credible to Alice than the personal experienced rating of her distant friends with low similarity. Predict rating for User1 sim = 0.9

28 Graph-based methods (1)
"Spreading activation" (Huang et al. 2004) Exploit the supposed "transitivity" of customer tastes and thereby augment the matrix with additional information Assume that we are looking for a recommendation for User1 When using a standard CF approach, User2 will be considered a peer for User1 because they both bought Item2 and Item4 Thus Item3 will be recommended to User1 because the nearest neighbor, User2, also bought or liked it The idea of the graph-based method proposed by Huang is that we exploit the “transitivity of consumers taste” so that we can augment matrix with additional information. . Consider we have user-rating matrix where 0 indicate user did NOT experience Item and 1 indicate the experienced item. First we need to model the user-rating matrix to bipartite graph where items and users are presented as a set of disjoints nodes. Based on the standard-CF method, User2 and User1 are similar as they both experienced Item2 and Item4. So, the next suggested item for User1 would be Item3. In a standard- CF path of length 3 would be considered!!

29 Graph-based methods (2)
"Spreading activation" (Huang et al. 2004) In a standard user-based or item-based CF approach, paths of length 3 will be considered – that is, Item3 is relevant for User1 because there exists a three-step path (User1–Item2–User2–Item3) between them Because the number of such paths of length 3 is small in sparse rating databases, the idea is to also consider longer paths (indirect associations) to compute recommendations Using path length 5, for instance

30 Graph-based methods (3)
"Spreading activation" (Huang et al. 2004) Idea: Recommend items with the longer path to fill the missing value. Use paths of lengths > 3 to recommend items Length 3: Recommend Item3 to User1 (Standard-CF method) Length 5: Item1 also recommendable (Spreading Activation method) This mechanism called spreading activation. When the rating matrix is sparse, this method (which is based on indirect relationship OR transitivity) help to improve the quality of recommendations compared with standard CF approaches. However, when the rating matrix reaches to a certain density, the quality of the recommendation decreases compared with the standard CF approaches. Noted that, the cons of this approach is that: the computation of distant relationship is computationally expensive and this model has not been tested for a large rating databases.

31 More model-based approaches
Plethora of different recommendation techniques proposed in the last years, e.g., Association rule mining compare: shopping basket analysis Probabilistic models clustering models, Bayesian networks,… Matrix factorization techniques, statistics singular value decomposition, principal component analysis Various other machine learning approaches

32 Product Association Recommender
One interesting approach to recommend items is through product association recommender. Observe which items are usually bought together, them recommend the item to users, based on their shopping baskets RS would not necessarily have information about products/users Can work well for non-personalized recommender systems Example: You are at a restaurant, order ice cream; you want a recommendation for a sauce: Do you want to hear that Ketchup is the most popular sauce? Or: do you want to hear which sauce is popular and associated with the ice cream?

33 Product Association: People Who liked X also Liked Y
Commonly used for shopping behavior analysis aims at detection of rules such as "If a customer purchases baby food then he also buys diapers in 70% of the cases"

34 Association rule mining
Association rule mining algorithms can detect rules of the form X → Y (e.g., baby food→ diapers) from a set of sales transactions D = {t1, t2, … tn} measure of quality: support, confidence used e.g. as a threshold to cut off unimportant rules Wont work well for one high favorable product popular and one less popular product! (Example?) Works well for two less popular products. \sigma (X) = Products/transactions If X is less favorable product: Vinegar, X = 5 people If Y is highly favorable product: Banana , Y = 70 people Population=100 people , people who bought (X and Y) = 4 Lets only calculate confidence!!! It is wrong.. Shows 80% association..

35 Improve the product association algorithm...
Lets’ adjust by calculating whether buying X increase the chance of buying Y, than NOT buying X. The formula focus on increase in Y associated with X. Works well for different combinations of products: Less popular with high popular (Vinegar and Banana) Two less popular products (Vinegar and garlic) One popular and one less popular items: Example: X = Vinegar = 5, Y = Banana = 70, (X and Y) = 4 Population = 100, !X = 95 , Y =70, (!X and Y) = (70-4) = 66 Confidence = 0.8 / 0.7 =1  buying X does not increase the chance of buying Y Two Less popular items: Example: X =vinegar = 5, Y = garlic = 10, (X and Y) = 3 Population = 100, !X = 95, (!X and Y) = (10-3 =7) Confidence = 0.6/0.01 = 60 buying X does increase the chance of buying Y.

36 Probabilistic methods
Basic idea (simplistic version for illustration): given the user/item rating matrix determine the probability that user Alice will like an item 𝑖 base the recommendation on such these probabilities Calculation of rating probabilities based on Bayes Theorem How probable is rating value "1" for Item5 given Alice's previous ratings? Corresponds to conditional probability P(Item5=1 | X), where X = Alice's previous ratings = (Item1 =1, Item2=3, Item3= … ) Can be estimated based on Bayes' Theorem Assumption: Ratings are independent (?)

37 Calculation of probabilities in simplistic approach
Item1 Item2 Item3 Item4 Item5 Alice 1 3 2 ? User1 4 User2 5 User3 User4 Xi = (Item1 =1, Item2=3, Item3= … ) P(X) = constant factor we omit it in calculation P(Y) i.e., (Y = item5)e.g. P(item5 =1) =2/4, P(item5=2) =0, so forth; calculated from the rating matrix We only need to calculate the class conditional probability: P(Xi|Y)

38 Slope One predictors (Lemire and Maclachlan 2005)
Idea of Slope One predictors is simple and is based on a popularity differential between items for users Example: p(Alice, Item5) = Basic scheme: Take the average of these differences of the co-ratings to make the prediction In general: Find a function of the form f(x) = x + b That is why the name is "Slope One" Item1 Item5 Alice 2 ? User1 1 - 2 + ( ) = 3

39 Slope One Predictor (Example)
P(Alice, Item3)=? Two co-ratings for Item1 and Item3: User1(Item1, item3), User2(item1, Item3) One co-rating for Item2 and Item3 User1(Item2, Item3) Take an average from the differences of either of these co-ratings Prediction of item3’s rating for Alice based on item1: ( ) = 2.5 Prediction of item3’s rating for Alice based on item2: (5+3 ) =8 Overall Prediction takes a number of co-ratings in to account to give more weight to deviation that has more data. We should only consider the target item we want to predict, to find co-rating.. We have two co-ratings for Item 1 and Item 3: User1(Item1, item3) = 5-3 =2 , User2(item1, Item3): (3-4) =-1 And one co-rating for Item 2 and Item 3: User1(Item2, Item3) : 5-2 =3 We take an average from the differences of co-ratings: 2 +(-1)/2 = 0.5

40 Collaborative Filtering Issues
Pros: well-understood, works well in some domains, no knowledge engineering required Cons: requires user community, sparsity problems, no integration of other knowledge sources, no explanation of results What is the best CF method? In which situation and which domain? Inconsistent findings; always the same domains and data sets; differences between methods are often very small (1/100) How to evaluate the prediction quality? MAE / RMSE: What does an MAE of 0.7 actually mean? Serendipity (novelty and surprising effect of recommendations) Not yet fully understood What about multi-dimensional ratings?

41 The Google News personalization engine

42 Google News portal (1) Aggregates news articles from several thousand sources Displays them to signed-in users in a personalized way Collaborative recommendation approach based on the click history of the active user and the history of the larger community Main challenges Vast number of articles and users Generate recommendation list in real time (at most one second) Constant stream of new items, other articles become out of date Immediately react to user interaction Significant efforts with respect to algorithms, engineering, and parallelization are required

43 Google News portal (2) Pure memory-based approaches are not directly applicable and for model-based approaches, the problem of continuous model updates must be solved A combination of model- and memory-based techniques is used Model-based part: Two clustering techniques are used Probabilistic Latent Semantic Indexing (PLSI) as proposed by (Hofmann 2004) MinHash as a hashing method Memory-based part: Analyze story co-visits for dealing with new users The overall recommendation score for each item is computed as a linear combination of all the scores obtained by three methods MinHash, PLSI and co-visits Google's MapReduce technique is used for parallelization in order to make computation scalable Because of the vast number of articles and users and the given response time , a pure memory based approach is not application and a combination of model-based and memory-based approach is used, PLSI is a probabilistic clustering techniques for collaborative filtering aiming to identify cluster of like-minded users and items. The novel idea is that, instead of one user belongs to only one cluster this approach accommodate the fact that users may have interest in various topic in parallel, so users can belong to multiple clusters. Minhash is a hashing method aiming to putting two users in the same cluster based on the overlaps of items that both users have viewed. Co-visits means that an article has been visited by the same user within a define period of time. The idea is the same as the item-based CF (to proposed similar items as articles have been read/click by users), but the difference is that the data is not preprocessed offline. Rather, the click information of users are constantly kept up to date. Prediction on what is the next article the user may be interested in are made by iterating over the users’ very recent history and retrieving the similar article. Google use Mapreduce techniques for distributing the computation over several clusters of deployed machines.

44 Assignment Assignment 3 from the “Introduction to Recommendation Systems” course offered in Coursera. This website is available free! Answer Part1 of the assignment3 systems/supplement/pBm3G/assignment-3-instructions Submit to < systems/exam/cYxFI/assignment-3-user-user-collaborative-filtering- quiz> Deadline  Next Week ( )

45 Literature (1) [Adomavicius and Tuzhilin 2005] Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions, IEEE Transactions on Knowledge and Data Engineering 17 (2005), no. 6, 734–749 [Breese et al. 1998] Empirical analysis of predictive algorithms for collaborative filtering, Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (Madison, WI) (Gregory F. Cooper and Seraf´in Moral, eds.), Morgan Kaufmann, 1998, pp. 43–52 [Gedikli et al. 2011] RF-Rec: Fast and accurate computation of recommendations based on rating frequencies, Proceedings of the 13th IEEE Conference on Commerce and Enterprise Computing - CEC 2011, Luxembourg, 2011, forthcoming [Goldberg et al. 2001] Eigentaste: A constant time collaborative filtering algorithm, Information Retrieval 4 (2001), no. 2, 133–151 [Golub and Kahan 1965] Calculating the singular values and pseudo-inverse of a matrix, Journal of the Society for Industrial and Applied Mathematics, Series B: Numerical Analysis 2 (1965), no. 2, 205–224 [Herlocker et al. 2002] An empirical analysis of design choices in neighborhood-based collaborative filtering algorithms, Information Retrieval 5 (2002), no. 4, 287–310 [Herlocker et al. 2004] Evaluating collaborative filtering recommender systems, ACM Transactions on Information Systems (TOIS) 22 (2004), no. 1, 5–53

46 Literature (2) [Hofmann 2004] Latent semantic models for collaborative filtering, ACM Transactions on Information Systems 22 (2004), no. 1, 89–115 [Huang et al. 2004] Applying associative retrieval techniques to alleviate the sparsity problem in collaborative filtering, ACM Transactions on Information Systems 22 (2004), no. 1, 116–142 [Koren et al. 2009] Matrix factorization techniques for recommender systems, Computer 42 (2009), no. 8, 30–37 [Lemire and Maclachlan 2005] Slope one predictors for online rating-based collaborative filtering, Proceedings of the 5th SIAM International Conference on Data Mining (SDM ’05) (Newport Beach, CA), 2005, pp. 471–480 [Sarwar et al. 2000a] Application of dimensionality reduction in recommender systems – a case study, Proceedings of the ACM WebKDD Workshop (Boston), 2000 [Zhang and Pu 2007] A recursive prediction algorithm for collaborative filtering recommender systems, Proceedings of the ACM Conference on Recommender Systems (RecSys ’07) (Minneapolis, MN), ACM, 2007, pp. 57–64


Download ppt "Collaborative Filtering"

Similar presentations


Ads by Google