Collaborative Filtering & Content-Based Recommending

Slides:



Advertisements
Similar presentations
Recommender Systems & Collaborative Filtering
Advertisements

Google News Personalization: Scalable Online Collaborative Filtering
Item Based Collaborative Filtering Recommendation Algorithms
Prediction Modeling for Personalization & Recommender Systems Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
The best indicator that a passenger will show up to board the flight is that she called in for a special meal Filtering and Recommender Systems Content-based.
A Graph-based Recommender System Zan Huang, Wingyan Chung, Thian-Huat Ong, Hsinchun Chen Artificial Intelligence Lab The University of Arizona 07/15/2002.
Collaborative Filtering Sue Yeon Syn September 21, 2005.
Jeff Howbert Introduction to Machine Learning Winter Collaborative Filtering Nearest Neighbor Approach.
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
1 RegionKNN: A Scalable Hybrid Collaborative Filtering Algorithm for Personalized Web Service Recommendation Xi Chen, Xudong Liu, Zicheng Huang, and Hailong.
COLLABORATIVE FILTERING Mustafa Cavdar Neslihan Bulut.
Partitioned Logistic Regression for Spam Filtering Ming-wei Chang University of Illinois at Urbana-Champaign Wen-tau Yih and Christopher Meek Microsoft.
Active Learning and Collaborative Filtering
The Wisdom of the Few A Collaborative Filtering Approach Based on Expert Opinions from the Web Xavier Amatriain Telefonica Research Nuria Oliver Telefonica.
2. Introduction Multiple Multiplicative Factor Model For Collaborative Filtering Benjamin Marlin University of Toronto. Department of Computer Science.
Filtering and Recommender Systems Content-based and Collaborative Some of the slides based On Mooney’s Slides.
CS345 Data Mining Recommendation Systems Netflix Challenge Anand Rajaraman, Jeffrey D. Ullman.
Filtering and Recommender Systems Content-based and Collaborative Some of the slides based On Mooney’s Slides.
1 Collaborative Filtering and Pagerank in a Network Qiang Yang HKUST Thanks: Sonny Chee.
1 Collaborative Filtering Rong Jin Department of Computer Science and Engineering Michigan State University.
RECOMMENDATION SYSTEMS
Agent Technology for e-Commerce
Customizable Bayesian Collaborative Filtering Denver Dash Big Data Reading Group 11/19/2007.
If you want to change the world, change the metaphor
Filtering and Recommender Systems Content-based and Collaborative
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
1 Introduction to Recommendation System Presented by HongBo Deng Nov 14, 2006 Refer to the PPT from Stanford: Anand Rajaraman, Jeffrey D. Ullman.
Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.
Recommender systems Ram Akella November 26 th 2008.
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Note to other teachers and users of these.
CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.
Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering, Slovak University of Technology
Item-based Collaborative Filtering Recommendation Algorithms
Active Learning for Class Imbalance Problem
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
1 Information Filtering & Recommender Systems (Lecture for CS410 Text Info Systems) ChengXiang Zhai Department of Computer Science University of Illinois,
A Hybrid Recommender System: User Profiling from Keywords and Ratings Ana Stanescu, Swapnil Nagar, Doina Caragea 2013 IEEE/WIC/ACM International Conferences.
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Experimental Evaluation of Learning Algorithms Part 1.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Presented By :Ayesha Khan. Content Introduction Everyday Examples of Collaborative Filtering Traditional Collaborative Filtering Socially Collaborative.
Toward the Next generation of Recommender systems
1 Recommender Systems Collaborative Filtering & Content-Based Recommending.
Chapter 6: Information Retrieval and Web Search
RecBench: Benchmarks for Evaluating Performance of Recommender System Architectures Justin Levandoski Michael D. Ekstrand Michael J. Ludwig Ahmed Eldawy.
EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 May 21 st, 2009 Nathan.
Collaborative Filtering  Introduction  Search or Content based Method  User-Based Collaborative Filtering  Item-to-Item Collaborative Filtering  Using.
Objectives Objectives Recommendz: A Multi-feature Recommendation System Matthew Garden, Gregory Dudek, Center for Intelligent Machines, McGill University.
A Content-Based Approach to Collaborative Filtering Brandon Douthit-Wood CS 470 – Final Presentation.
Evaluation of Recommender Systems Joonseok Lee Georgia Institute of Technology 2011/04/12 1.
1 Collaborative Filtering & Content-Based Recommending CS 290N. T. Yang Slides based on R. Mooney at UT Austin.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Similarity & Recommendation Arjen P. de Vries CWI Scientific Meeting September 27th 2013.
Recommender Systems. Recommender Systems (RSs) n RSs are software tools providing suggestions for items to be of use to users, such as what items to buy,
Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun
Google News Personalization Big Data reading group November 12, 2007 Presented by Babu Pillai.
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
User Modeling and Recommender Systems: recommendation algorithms
Item-Based Collaborative Filtering Recommendation Algorithms Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl GroupLens Research Group/ Army.
Collaborative Filtering - Pooja Hegde. The Problem : OVERLOAD Too much stuff!!!! Too many books! Too many journals! Too many movies! Too much content!
Analysis of massive data sets Prof. dr. sc. Siniša Srbljić Doc. dr. sc. Dejan Škvorc Doc. dr. sc. Ante Đerek Faculty of Electrical Engineering and Computing.
Collaborative Filtering With Decoupled Models for Preferences and Ratings Rong Jin 1, Luo Si 1, ChengXiang Zhai 2 and Jamie Callan 1 Language Technology.
Recommender Systems & Collaborative Filtering
CS728 The Collaboration Graph
Methods and Metrics for Cold-Start Recommendations
Collaborative Filtering Nearest Neighbor Approach
M.Sc. Project Doron Harlev Supervisor: Dr. Dana Ron
Presentation transcript:

Collaborative Filtering & Content-Based Recommending Recommender Systems Collaborative Filtering & Content-Based Recommending

Recommender Systems Systems for recommending items (e.g. books, movies, CD’s, web pages, newsgroup messages) to users based on examples of their preferences. Many on-line stores provide recommendations (e.g. Amazon, CDNow). Recommenders have been shown to substantially increase sales at on-line stores. There are two basic approaches to recommending: Collaborative Filtering (a.k.a. social filtering) Content-based

Book Recommender User Machine Profile Learning Red Mars Found ation Juras- sic Park User Profile Lost World Neuro- mancer 2010 2001 Differ- ence Engine

Personalization Recommenders are instances of personalization software. Personalization concerns adapting to the individual needs, interests, and preferences of each user. Includes: Recommending Filtering Predicting (e.g. form or calendar appt. completion) From a business perspective, it is viewed as part of Customer Relationship Management (CRM).

Machine Learning and Personalization Machine Learning can allow learning a user model or profile of a particular user based on: Sample interaction Rated examples This model or profile can then be used to: Recommend items Filter information Predict behavior

Collaborative Filtering Maintain a database of many users’ ratings of a variety of items. For a given user, find other similar users whose ratings strongly correlate with the current user. Recommend items rated highly by these similar users, but not rated by the current user. Almost all existing commercial recommenders use this approach (e.g. Amazon).

Collaborative Filtering : : Z 5 A B C 9 Z 10 A 5 Z 7 C 8 : : Z A 6 B 4 A 10 . . Z 1 User Database Correlation Match A 9 B 3 C : : Z 5 A 10 B 4 C 8 . . Z 1 C A 9 B 3 C . . Z 5 Extract Recommendations Active User

Collaborative Filtering Method Weight all users with respect to similarity with the active user. Select a subset of the users (neighbors) to use as predictors. Normalize ratings and compute a prediction from a weighted combination of the selected neighbors’ ratings. Present items with highest predicted ratings as recommendations.

Similarity Weighting Typically use Pearson correlation coefficient between ratings for active user, a, and another user, u. ra and ru are the ratings vectors for the m items rated by both a and u ri,j is user i’s rating for item j

Covariance and Standard Deviation

Significance Weighting Important not to trust correlations based on very few co-rated items. Include significance weights, sa,u, based on number of co-rated items, m.

Neighbor Selection For a given active user, a, select correlated users to serve as source of predictions. Standard approach is to use the most similar n users, u, based on similarity weights, wa,u Alternate approach is to include all users whose similarity weight is above a given threshold.

Rating Prediction Predict a rating, pa,i, for each item i, for active user, a, by using the n selected neighbor users, u  {1,2,…n}. To account for users different ratings levels, base predictions on differences from a user’s average rating. Weight users’ ratings contribution by their similarity to the active user.

Problems with Collaborative Filtering Cold Start: There needs to be enough other users already in the system to find a match. Sparsity: If there are many items to be recommended, even if there are many users, the user/ratings matrix is sparse, and it is hard to find users that have rated the same items. First Rater: Cannot recommend an item that has not been previously rated. New items Esoteric items Popularity Bias: Cannot recommend items to someone with unique tastes. Tends to recommend popular items.

Content-Based Recommending Recommendations are based on information on the content of items rather than on other users’ opinions. Uses a machine learning algorithm to induce a profile of the users preferences from examples based on a featural description of content. Some previous applications: Newsweeder (Lang, 1995) Syskill and Webert (Pazzani et al., 1996)

Advantages of Content-Based Approach No need for data on other users. No cold-start or sparsity problems. Able to recommend to users with unique tastes. Able to recommend new and unpopular items No first-rater problem. Can provide explanations of recommended items by listing content-features that caused an item to be recommended.

Disadvantages of Content-Based Method Requires content that can be encoded as meaningful features. Users’ tastes must be represented as a learnable function of these content features. Unable to exploit quality judgments of other users. Unless these are somehow included in the content features.

LIBRA Learning Intelligent Book Recommending Agent Content-based recommender for books using information about titles extracted from Amazon. Uses information extraction from the web to organize text into fields: Author Title Editorial Reviews Customer Comments Subject terms Related authors Related titles

LIBRA System Amazon Pages LIBRA Database Information Extraction Rated Machine Learning Learner Predictor Rated Examples User Profile Recommendations 1.~~~~~~ 2.~~~~~~~ 3.~~~~~ :

Age of Spiritual Machines Sample Amazon Page Age of Spiritual Machines

Sample Extracted Information Title: <The Age of Spiritual Machines: When Computers Exceed Human Intelligence> Author: <Ray Kurzweil> Price: <11.96> Publication Date: <January 2000> ISBN: <0140282025> Related Titles: <Title: <Robot: Mere Machine or Transcendent Mind> Author: <Hans Moravec> > … Reviews: <Author: <Amazon.com Reviews> Text: <How much do we humans…> > Comments: <Stars: <4> Author: <Stephen A. Haines> Text:<Kurzweil has …> > Related Authors: <Hans P. Moravec> <K. Eric Drexler>… Subjects: <Science/Mathematics> <Computers> <Artificial Intelligence> …

Libra Content Information Libra uses this extracted information to form “bags of words” for the following slots: Author Title Description (reviews and comments) Subjects Related Titles Related Authors

Libra Overview User rates selected titles on a 1 to 10 scale. Libra uses a naïve Bayesian text-categorization algorithm to learn a profile from these rated examples. Rating 6–10: Positive Rating 1–5: Negative The learned profile is used to rank all other books as recommendations based on the computed posterior probability that they are positive. User can also provide explicit positive/negative keywords, which are used as priors to bias the role of these features in categorization.

Bayesian Categorization in LIBRA Model is generalized to generate a vector of bags of words (one bag for each slot). Instances of the same word in different slots are treated as separate features: “Chrichton” in author vs. “Chrichton” in description Training examples are treated as weighted positive or negative examples when estimating conditional probability parameters: An example with rating 1  r  10 is given: positive probability: (r – 1)/9 negative probability: (10 – r)/9

Implementation Stopwords removed from all bags. A book’s title and author are added to its own related title and related author slots. All probabilities are smoothed using Laplace estimation to account for small sample size. Lisp implementation is quite efficient: Training: 20 exs in 0.4 secs, 840 exs in 11.5 secs Test: 200 books per second

Explanations of Profiles and Recommendations Feature strength of word wk appearing in a slot sj :

Libra Demo http://www.cs.utexas.edu/users/libra

Experimental Data Amazon searches were used to find books in various genres. Titles that have at least one review or comment were kept. Data sets: Literature fiction: 3,061 titles Mystery: 7,285 titles Science: 3,813 titles Science Fiction: 3.813 titles

Rated Data 4 users rated random examples within a genre by reviewing the Amazon pages about the title: LIT1 936 titles LIT2 935 titles MYST 500 titles SCI 500 titles SF 500 titles

Experimental Method 10-fold cross-validation to generate learning curves. Measured several metrics on independent test data: Precision at top 3: % of the top 3 that are positive Rating of top 3: Average rating assigned to top 3 Rank Correlation: Spearman’s, rs, between system’s and user’s complete rankings. Test ablation of related author and related title slots (LIBRA-NR). Test influence of information generated by Amazon’s collaborative approach.

Experimental Result Summary Precision at top 3 is fairly consistently in the 90’s% after only 20 examples. Rating of top 3 is fairly consistently above 8 after only 20 examples. All results are always significantly better than random chance after only 5 examples. Rank correlation is generally above 0.3 (moderate) after only 10 examples. Rank correlation is generally above 0.6 (high) after 40 examples.

Precision at Top 3 for Science

Rating of Top 3 for Science

Rank Correlation for Science

User Studies Subjects asked to use Libra and get recommendations. Encouraged several rounds of feedback. Rated all books in final list of recommendations. Selected two books for purchase. Returned reviews after reading selections. Completed questionnaire about the system.

Combining Content and Collaboration Content-based and collaborative methods have complementary strengths and weaknesses. Combine methods to obtain the best of both. Various hybrid approaches: Apply both methods and combine recommendations. Use collaborative data as content. Use content-based predictor as another collaborator. Use content-based predictor to complete collaborative data.

Movie Domain EachMovie Dataset [Compaq Research Labs] Contains user ratings for movies on a 0–5 scale. 72,916 users (avg. 39 ratings each). 1,628 movies. Sparse user-ratings matrix – (2.6% full). Crawled Internet Movie Database (IMDb) Extracted content for titles in EachMovie. Basic movie information: Title, Director, Cast, Genre, etc. Popular opinions: User comments, Newspaper and Newsgroup reviews, etc.

Content-Boosted Collaborative Filtering EachMovie Web Crawler IMDb User Ratings Matrix (Sparse) Content-based Predictor Movie Content Database Full User Ratings Matrix Collaborative Filtering Active User Ratings Recommendations

Content-Boosted CF - I User-ratings Vector Training Examples User-rated Items Unrated Items Content-Based Predictor Training Examples Pseudo User-ratings Vector Items with Predicted Ratings

Content-Boosted CF - II User Ratings Matrix Pseudo User Ratings Matrix Content-Based Predictor Compute pseudo user ratings matrix Full matrix – approximates actual full user ratings matrix Perform CF Using Pearson corr. between pseudo user-rating vectors

Experimental Method Used subset of EachMovie (7,893 users; 299,997 ratings) Test set: 10% of the users selected at random. Test users that rated at least 40 movies. Train on the remainder sets. Hold-out set: 25% items for each test user. Predict rating of each item in the hold-out set. Compared CBCF to other prediction approaches: Pure CF Pure Content-based Naïve hybrid (averages CF and content-based predictions)

Metrics Mean Absolute Error (MAE) ROC sensitivity [Herlocker 99] Compares numerical predictions with user ratings ROC sensitivity [Herlocker 99] How well predictions help users select high-quality items Ratings  4 considered “good”; < 4 considered “bad” Paired t-test for statistical significance

CBCF is significantly better (4% over CF) at (p < 0.001) Results - I CBCF is significantly better (4% over CF) at (p < 0.001)

CBCF outperforms rest (5% improvement over CF) Results - II CBCF outperforms rest (5% improvement over CF)

Active Learning (Sample Section, Learning with Queries) Used to reduce the number of training examples required. System requests ratings for specific items from which it would learn the most. Several existing methods: Uncertainty sampling Committee-based sampling

Semi-Supervised Learning (Weakly Supervised, Bootstrapping) Use wealth of unlabeled examples to aid learning from a small amount of labeled data. Several recent methods developed: Semi-supervised EM (Expectation Maximization) Co-training Transductive SVM’s

Conclusions Recommending and personalization are important approaches to combating information over-load. Machine Learning is an important part of systems for these tasks. Collaborative filtering has problems. Content-based methods address these problems (but have problems of their own). Integrating both is best.