Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.

Similar presentations


Presentation on theme: "Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent."— Presentation transcript:

1 Distributed Networks & Systems Lab

2 Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent advances in CF Conclusion

3 Distributed Networks & Systems Lab Recommendation System Help users to discover new items that may be hard for users to find Subclass of information filtering system that seek to predict the ‘rating’ or ‘preference’ that user would give to an item Recommender systems identify recommendations autonomously for individual users based on past purchases and searches, and on other users' behavior

4 Distributed Networks & Systems Lab

5 Recommendation System Content-based Collaborative Filtering Hybrid based on a descripti on of the item and a profile of the user’s p reference Combination of collabora tive filtering and content- based approach based on collecting and analyz ing a large amount of informati on on users’ behaviors, activiti es or preferences and predicti ng what users will like based o n their similarity to other users.

6 Distributed Networks & Systems Lab Recommendation System Content-based Collaborative Filtering Hybrid Memory-based Model-basedHybrid

7 Distributed Networks & Systems Lab

8 Collaborative filtering has performance challenges from the distinguishable characteristics Data sparsity Scalability Synonymy Gray sheep Shilling attacks

9 Distributed Networks & Systems Lab In internet markets, the variation of products makes user-item matrix sparse. How to process sparse data and match?

10 Distributed Networks & Systems Lab Cold start problem A new user or item has just entered the system. Hard to find similar ones since there is not enough information Too small users’ ratings compared to the large number of items in the system Causes reduced coverage

11 Distributed Networks & Systems Lab Users with same tastes may not be indentified as such if there is no co- rated items

12 Distributed Networks & Systems Lab Dimensionality reduction techniques Singular Value Decomposition  Removes unrepresentative or insignificant users or items to reduce the dimensionalities of the user-item based matrix directly Reduced sparsity, but some drawbacks  Meaningful data also discarded  Caused decrease in quality

13 Distributed Networks & Systems Lab Large size of data caused longer compute time under limited resources Dimensionality reduction can help this problem, but requires extra steps(matrix factorization) which has expensive cost Incremental SVD algorithm has been suggested to reduce the cost of the step

14 Distributed Networks & Systems Lab Same kind of products, different names “Children movie”, “children film” Memory based CF systems are vulnerable to this problem Attempts were made to solve this Intellectual or automatically term expansion could have partial solution, but has some drawbacks

15 Distributed Networks & Systems Lab Users that are not ordinary Hard to make prediction for them No full solution for this Per-user approach were made to reduce this problem

16 Distributed Networks & Systems Lab Intended increase in good rating and negative rating by the product sales company Item based CF algorithm was much less affected by the attacks than the user-based CF algorithm

17 Distributed Networks & Systems Lab Observing personal habit of users Privacy invasion Noise increase From increase in diversity Explainability Let users know the reason why the system recommends the specific item

18 Distributed Networks & Systems Lab Memorize the rating matrix and issue recommendations based on the relationship between the queried user and item and the rest of the matrix Uses the entire or a sample of the user-item database to make prediction Every user is part of a group of people with similar interests

19 Distributed Networks & Systems Lab Most popular memory-based CF method Predict ratings by referring to users whose ratings are similar to the queried user, or to items that are similar to queried item. Calculate similarity or weight then,  Aggregate the neighbors to get the top-N most frequent items as the recommendation

20 Distributed Networks & Systems Lab Critical step For item-based CF Compute similarity between items For user-based CF Compute similarity between users u and v who have both rated the same items

21 Distributed Networks & Systems Lab To get the similarity  W u,v between two users u and v  W i,j between two items i and j Pearson Correlation is used to measure similarity  Measures the linear independence between two variables(or users) as a function of their attributes

22 Distributed Networks & Systems Lab User-based algorithm i ∈ I summations are over the items that both the users u and v have rated, And is the average rating of the co-rated items of the u-th user. Item-based algorithm r u,I s is the rating of user u on item I, And is the rating of the i-th item by those users.

23 Distributed Networks & Systems Lab Used to find similarity between two documents each document as a vector of word frequencies Compute the cosine of the angle formed by the frequency vectors For collaborative filtering, Treat users or items as a vector of ratings and compute the cosine of the angle formed by the rating vectors

24 Distributed Networks & Systems Lab Similarity between two items i and j Example: For vector A={x1, y1}, vector B={x2, y2}

25 Distributed Networks & Systems Lab In the neighborhood-based CF, a subset of nearest neighbors of the active user are chosen based on their similarity with him or her and weighted aggregate of their ratings is used to generate predictions for the active user

26 Distributed Networks & Systems Lab To make prediction for active user a, on a certain item i, We can take a weighted average of all the ratings on that item by using this average ratings for the user a on all other ratings average ratings for the user u on all other ratings w a,u weight between the user a and user u

27 Distributed Networks & Systems Lab To predict the rating for U1 on I2,

28 Distributed Networks & Systems Lab For item-based prediction, We can use simple weighted average P u,i for user u on item i

29 Distributed Networks & Systems Lab To recommend a set of N top-ranked items that will be of interest to a certain user Returning customer may get the list of recommendation Top-N recommendation techniques analyze the user-item matrix to discover relations between different users or items and use them to compute recommendations Association rule mining can be used to make Top-N recommendations

30 Distributed Networks & Systems Lab The design and development of models (machine learning, data mining algorithms) can allow the system to learn to recognize the complex patterns based on training data and make predictions from learned models Classification algorithm can be used as CF models if the user ratings are categorical Regression models and SVD methods can be used for numerical ratings

31 Distributed Networks & Systems Lab Uses a naïve Bayes (NB) strategy to make predictions Assuming the features are independent given the class The probability of a certain class given all of the features can be computed Then class with the highest probability will be classified as the predicted classes

32 Distributed Networks & Systems Lab Shows better scalability Make predictions within much smaller clusters rather than the entire customer bse

33 Distributed Networks & Systems Lab Memory-based and model-based CF approaches are combined to from hybrid CF approaches Shows some improvement Probabilistic memory-based CF Personality diagnosis

34 Distributed Networks & Systems Lab Combined memory-based and model based To address the New user problem, an active learning extension to the PMCF system can be used to actively query a user for additional information. To reduce computation time, PMCF Selects a small subset, ‘profile space’ from the entire database of user ratings and make prediction from the small profile space, not the whole database Better accuracy than Pearson correlation-based CF Model based using naïve Bayes

35 Distributed Networks & Systems Lab Combined and keeps the both advantage Given the active user’s known ratings, we can calculate the probability that he or she is the same “personality type” as other users, and predict whether he will like the new items

36 Distributed Networks & Systems Lab


Download ppt "Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent."

Similar presentations


Ads by Google