Presentation is loading. Please wait.

Presentation is loading. Please wait.

If you want to change the world, change the metaphor

Similar presentations


Presentation on theme: "If you want to change the world, change the metaphor"— Presentation transcript:

1 If you want to change the world, change the metaphor
--Joseph Campbell (The Power of Myth) Filtering and Recommender Systems Content-based and Collaborative For years, people [in Kenya] have said that a Luo had a better chance of becoming president of the United States than the president of Kenya. Apparently, they were right. -All Things Considered; NPR; 11/5/08

2 Filtering and Recommender Systems Content-based and Collaborative
Some of the slides based On Mooney’s Slides

3 Personalization Recommenders are instances of personalization software. Personalization concerns adapting to the individual needs, interests, and preferences of each user. Includes: Recommending Filtering Predicting (e.g. form or calendar appt. completion) From a business perspective, it is viewed as part of Customer Relationship Management (CRM).

4 Feedback & Prediction/Recommendation
Traditional IR has a single user—probably working in single-shot modes Relevance feedback… WEB search engines have: Working continually User profiling Profile is a “model” of the user (and also Relevance feedback) Many users Collaborative filtering Propagate user preferences to other users… You know this one

5 Recommender Systems in Use
Systems for recommending items (e.g. books, movies, CD’s, web pages, newsgroup messages) to users based on examples of their preferences. Many on-line stores provide recommendations (e.g. Amazon, CDNow). Recommenders have been shown to substantially increase sales at on-line stores.

6 Feedback Detection Explicitly ask users to rate items/pages
Non-Intrusive Intrusive Click certain pages in certain order while ignore most pages. Read some clicked pages longer than some other clicked pages. Save/print certain clicked pages. Follow some links in clicked pages to reach more pages. Buy items/Put them in wish-lists/Shopping Carts Explicitly ask users to rate items/pages

7 Justifying Recommendation..
Recommendation systems must justify their recommendations Even if the justification is bogus.. For search engines, the “justifications” are the page synopses Some recommendation algorithms are better at providing human-understandable justifications than others Content-based ones can justify in terms of classifier features.. Collaborative ones are harder-pressed other than saying “people like you seem to like this stuff” In general, giving good justifications is important..

8 Content-based vs. Collaborative
Recommendation Needs description of items… Needs only ratings from other users

9 Content-Based Recommending
Recommendations are based on information on the content of items rather than on other users’ opinions. Uses machine learning algorithms to induce a profile of the users preferences from examples based on a featural description of content. Lots of systems

10 Adapting Naïve Bayes idea for Book Recommendation
Vector of Bags model E.g. Books have several different fields that are all text Authors, description, … A word appearing in one field is different from the same word appearing in another Want to keep each bag different—vector of m Bags; Conditional probabilities for each word w.r.t each class and bag Can give a profile of a user in terms of words that are most predictive of what they like Odds Ratio P(rel|example)/P(~rel|example) An example is positive if the odds ratio is > 1 Strengh of a keyword Log[P(w|rel)/P(w|~rel)] We can summarize a user’s profile in terms of the words that have strength above some threshold.

11 Collaborative Filtering
: : Z 5 A B C 9 Z 10 A 5 Z 7 C 8 : : Z A 6 B 4 A 10 . . Z 1 User Database Correlation Match A 9 B 3 C : : Z 5 A 10 B 4 C 8 . . Z 1 Correlation analysis Here is similar to the Association clusters Analysis! C A 9 B 3 C . . Z 5 Extract Recommendations Active User

12 Item-User Matrix The input to the collaborative filtering algorithm is an mxn matrix where rows are items and columns are users Sort of like term-document matrix (items are terms and documents are users) Can think of users as vectors in the space of items (or vice versa) Can do vector similarity between users And find who are most similar users.. Can do scalar clusters over items etc.. And find what are most correlated items Think usersdocs Itemskeywords

13 A Collaborative Filtering Method (think kNN regression)
Weight all users with respect to similarity with the active user. How to measure similarity? Could use cosine similarity; normally pearson coefficient is used Select a subset of the users (neighbors) to use as predictors. Normalize ratings and compute a prediction from a weighted combination of the selected neighbors’ ratings. Present items with highest predicted ratings as recommendations.

14 Finding User Similarity with Person Correlation Coefficient
Typically use Pearson correlation coefficient between ratings for active user, a, and another user, u. ra and ru are the ratings vectors for the m items rated by both a and u ri,j is user i’s rating for item j

15 Neighbor Selection For a given active user, a, select correlated users to serve as source of predictions. Standard approach is to use the most similar k users, u, based on similarity weights, wa,u Alternate approach is to include all users whose similarity weight is above a given threshold.

16 Rating Prediction Predict a rating, pa,i, for each item i, for active user, a, by using the k selected neighbor users, u  {1,2,…k}. To account for users different ratings levels, base predictions on differences from a user’s average rating. Weight users’ ratings contribution by their similarity to the active user. ri,j is user i’s rating for item j

17 Similarity Weighting=User Similarity
Typically use Pearson correlation coefficient between ratings for active user, a, and another user, u. ra and ru are the ratings vectors for the m items rated by both a and u ri,j is user i’s rating for item j

18 Significance Weighting
Important not to trust correlations based on very few co-rated items. Include significance weights, sa,u, based on number of co-rated items, m.

19 Covariance and Standard Deviation

20 Problems with Collaborative Filtering
Cold Start: There needs to be enough other users already in the system to find a match. Sparsity: If there are many items to be recommended, even if there are many users, the user/ratings matrix is sparse, and it is hard to find users that have rated the same items. First Rater: Cannot recommend an item that has not been previously rated. New items Esoteric items Popularity Bias: Cannot recommend items to someone with unique tastes. Tends to recommend popular items. WHAT DO YOU MEAN YOU DON’T CARE FOR BRITNEY SPEARS YOU DUNDERHEAD? #$%$%$&^

21 Advantages of Content-Based Approach
No need for data on other users. No cold-start or sparsity problems. Able to recommend to users with unique tastes. Able to recommend new and unpopular items No first-rater problem. Can provide explanations of recommended items by listing content-features that caused an item to be recommended. Well-known technology The entire field of Classification Learning is at (y)our disposal!

22 Disadvantages of Content-Based Method
Requires content that can be encoded as meaningful features. Users’ tastes must be represented as a learnable function of these content features. Unable to exploit quality judgments of other users. Unless these are somehow included in the content features.

23 Movie Domain EachMovie Dataset [Compaq Research Labs]
Contains user ratings for movies on a 0–5 scale. 72,916 users (avg. 39 ratings each). 1,628 movies. Sparse user-ratings matrix – (2.6% full). Crawled Internet Movie Database (IMDb) Extracted content for titles in EachMovie. Basic movie information: Title, Director, Cast, Genre, etc. Popular opinions: User comments, Newspaper and Newsgroup reviews, etc.

24 Content-Boosted Collaborative Filtering
EachMovie Web Crawler IMDb User Ratings Matrix (Sparse) Content-based Predictor Movie Content Database Full User Ratings Matrix Collaborative Filtering Active User Ratings Recommendations

25 Content-Boosted CF - I User-ratings Vector Training Examples
User-rated Items Unrated Items Content-Based Predictor Training Examples Pseudo User-ratings Vector Items with Predicted Ratings

26 Content-Boosted CF - II
User Ratings Matrix Pseudo User Ratings Matrix Content-Based Predictor Compute pseudo user ratings matrix Full matrix – approximates actual full user ratings matrix Perform CF Using Pearson corr. between pseudo user-rating vectors This works better than either!

27 Why can’t the pseudo ratings be used to help content-based filtering?
How about using the pseudo ratings to improve a content-based filter itself? (or how access to unlabelled examples improves accuracy…) Learn a NBC classifier C0 using the few items for which we have user ratings Use C0 to predict the ratings for the rest of the items Loop Learn a new classifier C1 using all the ratings (real and predicted) Use C1 to (re)-predict the ratings for all the unknown items Until no change in ratings With a small change, this actually works in finding a better classifier! Change: Keep the class posterior prediction (rather than just the max class) This means that each (unlabelled) entity could belong to multiple classes—with fractional membership in each We weight the counts by the membership fractions E.g. P(A=v|c) = Sum of class weights of all examples in c that have A=v divided by Sum of class weights of all examples in c This is called expectation maximization Very useful on web where you have tons of data, but very little of it is labelled Reminds you of K-means, doesn’t it?

28

29 (boosted) content filtering

30 Co-Training Motivation
Learning methods need labeled data Lots of <x, f(x)> pairs Hard to get… (who wants to label data?) But unlabeled data is usually plentiful… Could we use this instead??????

31 Co-training Suppose each instance has two parts:
You train me—I train you… Co-training Small labeled data needed Suppose each instance has two parts: x = [x1, x2] x1, x2 conditionally independent given f(x) Suppose each half can be used to classify instance f1, f2 such that f1(x1) = f2(x2) = f(x) Suppose f1, f2 are learnable f1  H1, f2  H2,  learning algorithms A1, A2 ~ A2 A1 [x1, x2] <[x1, x2], f1(x1)> f2 Unlabeled Instances Labeled Instances Hypothesis

32 It really works! Learning to classify web pages as course pages
x1 = bag of words on a page x2 = bag of words from all anchors pointing to a page Naïve Bayes classifiers 12 labeled pages 1039 unlabeled

33 Observations Can apply A1 to generate as much training data as one wants If x1 is conditionally independent of x2 / f(x), then the error in the labels produced by A1 will look like random noise to A2 !!! Thus no limit to quality of the hypothesis A2 can make

34

35 Focussed Crawling Cho paper Prioritize if word in anchor / URL
Looks at heuristics for managing URL queue Aim1: completeness Aim2: just topic pages Prioritize if word in anchor / URL Heuristics: Pagerank #backlinks

36 Modified Algorithm Page is hot if: Contains keyword in title, or
Contains 10 instances of keyword in body, or Distance(page, hot-page) < 3

37 Results

38 More Results

39 Conclusions Recommending and personalization are important approaches to combating information over-load. Machine Learning is an important part of systems for these tasks. Collaborative filtering has problems. Content-based methods address these problems (but have problems of their own). Integrating both is best. Which lead us to discuss some approaches that wind up using unlabelled data along with labelled data to improve performance.

40 Discussion of the Google News Collaborative Filtering Paper


Download ppt "If you want to change the world, change the metaphor"

Similar presentations


Ads by Google