A Review of Information Filtering Part II: Collaborative Filtering Chengxiang Zhai Language Technologies Institiute School of Computer Science Carnegie.

A Review of Information Filtering Part II: Collaborative Filtering Chengxiang Zhai Language Technologies Institiute School of Computer Science Carnegie Mellon University

Outline A Conceptual Framework for Collaborative Filtering (CF) Rating-based Methods (Breese et al. 98) –Memory-based methods –Model-based methods Preference-based Methods (Cohen et al. 99 & Freund et al. 98) Summary & Research Directions

What is Collaborative Filtering (CF)? Making filtering decisions for an individual user based on the judgments of other users Inferring individual’s interest/preferences from that of other similar users General idea –Given a user u, find similar users {u 1, …, u m } –Predict u’s preferences based on the preferences of u 1, …, u m

CF: Applications Recommender Systems: books, CDs, Videos, Movies, potentially anything! Can be combined with content-based filtering Example (commercial) systems –GroupLens (Resnick et al. 94): usenet news rating –Amazon: book recommendation –Firefly (purchased by Microsoft?): music recommendation –Alexa: web page recommendation

CF: Assumptions Users with a common interest will have similar preferences Users with similar preferences probably share the same interest Examples –“interest is IR” => “read SIGIR papers” –“read SIGIR papers” => “interest is IR” Sufficiently large number of user preferences are available

CF: Intuitions User similarity –If Jamie liked the paper, I’ll like the paper –? If Jamie liked the movie, I’ll like the movie –Suppose Jamie and I viewed similar movies in the past six months … Item similarity –Since 90% of those who liked Star Wars also liked Independence Day, and, you liked Star Wars –You may also like Independence Day

Collaborative Filtering vs. Content-based Filtering Basic filtering question: Will user U like item X? Two different ways of answering it –Look at what U likes –Look at who likes X Can be combined => characterize X => content-based filtering => characterize U => collaborative filtering

Rating-based vs. Preference-based Rating-based: User’s preferences are encoded using numerical ratings on items –Complete ordering –Absolute values can be meaningful –But, values must be normalized to combine Preferences: User’s preferences are represented by partial ordering of items –Partial ordering –Easier to exploit implicit preferences

A Formal Framework for Rating u 1 u 2 … u i... u m Users: U Objects: O o 1 o 2 … o j … o n 3 1.5 …. … 2 2 1 3 X ij =f(u i,o j )=? ? The task Unknown function f: U x O  R Assume known f values for some (u,o)’s Predict f values for other (u,o)’s Essentially function approximation, like other learning problems

Where are the intuitions? Similar users have similar preferences –If u  u’, then for all o’s, f(u,o)  f(u’,o) Similar objects have similar user preferences –If o  o’, then for all u’s, f(u,o)  f(u,o’) In general, f is “locally constant” –If u  u’ and o  o’, then f(u,o)  f(u’,o’) –“Local smoothness” makes it possible to predict unknown values by interpolation or extrapolation What does “local” mean?

Two Groups of Approaches Memory-based approaches –f(u,o) = g(u)(o)  g(u’)(o) if u  u’ –Find “neighbors” of u and combine g(u’)(o)’s Model-based approaches –Assume structures/model: object cluster, user cluster, f’ defined on clusters –f(u,o) = f’(c u, c o ) –Estimation & Probabilistic inference

Memory-based Approaches (Breese et al. 98) General ideas: –X ij : rating of object j by user i –n i : average rating of all objects by user i –Normalized ratings: V ij = X ij - n i –Memory-based prediction Specific approaches differ in w(a,i) -- the distance/similarity between user a and i

User Similarity Measures Pearson correlation coefficient (sum over commonly rated items) Cosine measure Many other possibilities!

Improving User Similarity Measures (Breese et al. 98) Dealing with missing values: default ratings Inverse User Frequency (IUF): similar to IDF Case Amplification: use w(a,I) p, e.g., p=2.5

Model-based Approaches (Breese et al. 98) General ideas –Assume that data/ratings are explained by a probabilistic model with parameter  –Estimate/learn model parameter  based on data –Predict unknown rating using E  [x k+1 | x 1, …, x k ], which is computed using the estimated model Specific methods differ in the model used and how the model is estimated

Probabilistic Clustering Clustering users based on their ratings –Assume ratings are observations of a multinomial mixture model with parameters p(C), p(x i |C) –Model estimated using standard EM Predict ratings using E[x k+1 | x 1, …, x k ]

Bayesian Network Use BN to capture object/item dependency –Each item/object is a node –(Dependency) structure is learned from all data –Model parameters: p(x k+1 |pa(x k+1 )) where pa(x k+1 ) is the parents/predictors of x k+1 (represented as a decision tree) Predict ratings using E[x k+1 | x 1, …, x k ]

Three-way Aspect Model (Popescul et al. 2001) CF + content-based Generative model (u,d,w) as observations z as hidden variable Standard EM Essentially clustering the joint data Evaluation on ResearchIndex data Found it’s better to treat (u,w) as observations

Evaluation Criteria (Breese et al. 98) Rating accuracy –Average absolute deviation –Pa = set of items predicted Ranking accuracy –Expected utility –Exponentially decaying viewing probabillity –  ( halflife )= the rank where the viewing probability =0.5 –d = neutral rating

Datasets

Results - BN & CR+ are generally better than VSIM & BC - BN is best with more training data - VSIM is better with little training data - Inverse User Freq. Is effective - Case amplification is mostly effective

Summary of Rating-based Methods Effectiveness –Both memory-based and model-based methods can be effective –The correlation method appears to be robust –Bayesian network works well with plenty of training data, but not very well with little training data –The cosine similarity method works well with little training data

Summary of Rating-based Methods (cont.) Efficiency –Memory based methods are slower than model- based methods in predicting –Learning can be extremely slow for model-based methods

Preference-based Methods (Cohen et al. 99, Freund et al. 98) Motivation –Explicit ratings are not always available, but implicit orderings/preferences might be available –Only relative ratings are meaningful, even if when ratings are available –Combining preferences has other applications, e.g., Merging results from different search engines

A Formal Model of Preferences Instances: O={o 1,…, o n } Ranking function: R: (U x) O x O  [0,1] –R(u,v)=1 means u is strongly preferred to v –R(u,v)=0 means v is strongly preferred to u –R(u,v)=0.5 means no preference Feedback: F = {(u,v)}, u is preferred to v Minimize Loss: Hypothesis space

The Hypothesis Space H Without constraints on H, the loss is minimized by any R that agrees with F Appropriate constraints for collaborative filtering Compare this with

The Hedge Algorithm for Combining Preferences Iterative updating of w 1, w 2, …, w n Initialization: w i is uniform Updating:   [0,1] L=0 => weight stays L is large => weight is decreased

Some Theoretical Results The cumulative loss of Ra will not be much worse than that of the best ranking expert/feature Preferences Ra => ordering  => R  L(R ,F) <= DISAGREE( ,Ra)/|F| + L(Ra,F) Need to find  that minimizes disagreement General case: NP-complete

A Greedy Ordering Algorithm Use weighted graph to represent preferences R For each node, compute the potential value, I.e., outgoing_weights - ingoing_weights Rank the node with the highest potential value above all others Remove this node and its edges, repeat At least half of the optimal agreement is guaranteed

Improvement Identify all the strongly connected components Rank the components consistently with the edges between them Rank the nodes within a component using the basic greedy algorithm

Evaluation of Ordering Algorithms Measure: “weight coverage” Datasets = randomly generated small graphs Observations –The basic greedy algorithm works better than a random permutation baseline –Improved version is generally better, but the improvement is insignificant for large graphs

Metasearch Experiments Task: Known item search –Search for a ML researchers’ homepage –Search for a university homepage Search expert = variant of query Learn to merge results of all search experts Feedback –Complete : known item preferred to all others –Click data : known item preferred to all above it Leave-one-out testing

Metasearch Results Measures: compare combined preferences with individual ranking function –sign test: to see which system tends to rank the known relevant article higher. –#queries with the known relevant item ranked above k. –average rank of the known relevant item Learned system better than individual expert by all measure (not surprising, why?)

Metasearch Results (cont.)

Direct Learning of an Ordering Function Each expert is treated as a ranking feature f i : O  R U {0} (allow partial ranking) Given preference feedback  : X x X  R Goal: to learn H that minimizes the loss D  (x 0,x 1 ): a distribution over X x X (actually a uniform dist. over pairs with feedback order) D  (x 0,x 1 ) = c max{0,  (x 0,x 1 ) }

The RankBoost Algorithm Iterative updating of D(x 0,x 1 ) Initialization: D 1 = D  For t=1,…,T: –Train weak learner using D t –Get weak hypothesis h t : X  R –Choose  t >0 –Update Final hypothesis:

How to Choose  t and Design h t ? Bound on the ranking loss Thus, we should choose  t that minimizes the bound Three approaches: –Numerical search –Special case: h is either 0 or 1 –Approximation of Z, then find analytic solution

Efficient RankBoost for Bipartite Feedback Complexity at each round: O(|X 0 ||X 1 |)  O(|X 0 |+|X 1 |) Bipartite feedback: Essentially binary classification X0X0 X1X1

Evaluation of RankBoost Meta-search: Same as in (Cohen et al 99) Perfect feedback 4-fold cross validation

EachMovie Evaluation # users #movies/user #feedback movies

Performance Comparison Cohen et al. 99 vs. Freund et al. 99

Summary CF is “easy” –The user’s expectation is low –Any recommendation is better than none –Making it practically useful CF is “hard” –Data sparseness –Scalability –Domain-dependent

Summary (cont.) CF as a Learning Task –Rating-based formulation Learn f: U x O -> R Algorithms –Instance-based/memory-based (k-nearest neighbors) –Model-based (probabilistic clustering) –Preference-based formulation Learn PREF: U x O x O -> R Algorithms –General preference combination (Hedge), greedy ordering –Efficient restricted preference combination (RankBoost)

Summary (cont.) Evaluation –Rating-based methods Simple methods seem to be reasonably effective Advantage of sophisticated methods seems to be limited –Preference-based methods More effective than rating-based methods according to one evaluation Evaluation on meta-search is weak

Research Directions Exploiting complete information –CF + content-based filtering + domain knowledge + user model … More “localized” kernels for instance- based methods –Predicting movies need different “neighbor users” than predicting books –Suggesting using items similar to the target item as features to find neighbors

Research Directions (cont.) Modeling time –There might be sequential patterns on the items a user purchased (e.g., bread machine -> bread machine mix) Probabilistic model of preferences –Making preference function a probability function, e.g, P(A>B|U) –Clustering items and users –Minimizing preference disagreements

References Cohen, W.W., Schapire, R.E., and Singer, Y. (1999) "Learning to Order Things", Journal of AI Research, Volume 10, pages 243-270. Freund, Y., Iyer, R.,Schapire, R.E., & Singer, Y. (1999). An efficient boosting algorithm for combining preferences. Machine Learning Journal. 1999. Breese, J. S., Heckerman, D., and Kadie, C. (1998). Empirical Analysis of Predictive Algorithms for Collaborative Filtering. In Proceedings of the 14th Conference on Uncertainty in Articial Intelligence, pp. 43-52. Alexandrin Popescul and Lyle H. Ungar, Probabilistic Models for Unified Collaborative and Content-Based Recommendation in Sparse-Data Environments, UAI 2001. N. Good, J.B. Schafer, J. Konstan, A. Borchers, B. Sarwar, J. Herlocker, and J. Riedl. "Combining Collaborative Filtering with Personal Agents for Better Recommendations." Proceedings AAAI-99. pp 439-446. 1999.

The End Thank you!

A Review of Information Filtering Part II: Collaborative Filtering Chengxiang Zhai Language Technologies Institiute School of Computer Science Carnegie.

Similar presentations

Presentation on theme: "A Review of Information Filtering Part II: Collaborative Filtering Chengxiang Zhai Language Technologies Institiute School of Computer Science Carnegie."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Review of Information Filtering Part II: Collaborative Filtering Chengxiang Zhai Language Technologies Institiute School of Computer Science Carnegie.

Similar presentations

Presentation on theme: "A Review of Information Filtering Part II: Collaborative Filtering Chengxiang Zhai Language Technologies Institiute School of Computer Science Carnegie."— Presentation transcript:

Similar presentations

About project

Feedback