Online Social Networks and Media Recommender Systems Collaborative Filtering Social recommendations Thanks to: Jure Leskovec, Anand Rajaraman, Jeff Ullman.

Slides:

Advertisements

Similar presentations

Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper.

Advertisements

Prediction Modeling for Personalization & Recommender Systems Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs.

Neural networks Introduction Fitting neural networks

Linear Regression.

Similarity and Distance Sketching, Locality Sensitive Hashing

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Dimensionality Reduction PCA -- SVD

Jeff Howbert Introduction to Machine Learning Winter Collaborative Filtering Nearest Neighbor Approach.

Oct 14, 2014 Lirong Xia Recommender systems acknowledgment: Li Zhang, UCSC.

Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research.

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Note to other teachers and users of these.

Social Media Mining Chapter 5 1 Chapter 5, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool, September, 2010.

1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.

Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,

CS345 Data Mining Recommendation Systems Netflix Challenge Anand Rajaraman, Jeffrey D. Ullman.

Recommendations via Collaborative Filtering. Recommendations Relevant for movies, restaurants, hotels…. Recommendation Systems is a very hot topic in.

Chapter 8 Collaborative Filtering Stand

Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.

1 Introduction to Recommendation System Presented by HongBo Deng Nov 14, 2006 Refer to the PPT from Stanford: Anand Rajaraman, Jeffrey D. Ullman.

Recommender systems Ram Akella November 26 th 2008.

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Note to other teachers and users of these.

DATA MINING LECTURE 7 Dimensionality Reduction PCA – SVD

Recommendation Systems

Quest for $1,000,000: The Netflix Prize Bob Bell AT&T Labs-Research July 15, 2009 Joint work with Chris Volinsky, AT&T Labs-Research and Yehuda Koren,

CS 277: Data Mining Recommender Systems

Chapter 12 (Section 12.4) : Recommender Systems Second edition of the book, coming soon.

Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

Cao et al. ICML 2010 Presented by Danushka Bollegala.

1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.

1 Information Filtering & Recommender Systems (Lecture for CS410 Text Info Systems) ChengXiang Zhai Department of Computer Science University of Illinois,

Non Negative Matrix Factorization

EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.

Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.

Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.

EigenRank: A ranking oriented approach to collaborative filtering By Nathan N. Liu and Qiang Yang Presented by Zachary 1.

Recommender Systems Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Credits to Bing Liu (UIC) and Angshul Majumdar.

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

DATA MINING LECTURE 5 Similarity and Distance Recommender Systems.

The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun

Lecture 5 Instructor: Max Welling Squared Error Matrix Factorization.

CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original.

Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun

Recommender Systems with Social Regularization Hao Ma, Dengyong Zhou, Chao Liu Microsoft Research Michael R. Lyu The Chinese University of Hong Kong Irwin.

Yue Xu Shu Zhang.  A person has already rated some movies, which movies he/she may be interested, too?  If we have huge data of user and movies, this.

Dimensionality Reduction

Feature Selection and Dimensionality Reduction. “Curse of dimensionality” – The higher the dimensionality of the data, the more data is needed to learn.

Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.

DATA MINING LECTURE 8 Sequence Segmentation Dimensionality Reduction.

Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Jeffrey D. Ullman Stanford University.  Often, our data can be represented by an m-by-n matrix.  And this matrix can be closely approximated by the.

Collaborative Deep Learning for Recommender Systems

Analysis of massive data sets Prof. dr. sc. Siniša Srbljić Doc. dr. sc. Dejan Škvorc Doc. dr. sc. Ante Đerek Faculty of Electrical Engineering and Computing.

1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.

Matrix Factorization and Collaborative Filtering

High dim. data Graph data Infinite data Machine learning Apps

Recommender Systems 01 , Content Based , Collaborative Filtering

CSE 4705 Artificial Intelligence

MATRIX FACTORIZATION TECHNIQUES FOR RECOMMENDER SYSTEMS

Similarity and Distance Recommender Systems

Adopted from Bin UIC Recommender Systems Adopted from Bin UIC.

DATA MINING LECTURE 6 Dimensionality Reduction PCA – SVD

Step-By-Step Instructions for Miniproject 2

Q4 : How does Netflix recommend movies?

RECOMMENDER SYSTEMS WITH SOCIAL REGULARIZATION

CSE 491/891 Lecture 25 (Mahout).

Recommendation Systems

Recommender Systems Group 6 Javier Velasco Anusha Sama

Presentation transcript:

Online Social Networks and Media Recommender Systems Collaborative Filtering Social recommendations Thanks to: Jure Leskovec, Anand Rajaraman, Jeff Ullman

An important problem Recommendation systems – When a user buys an item (initially books) we want to recommend other items that the user may like – When a user rates a movie, we want to recommend movies that the user may like – When a user likes a song, we want to recommend other songs that they may like A big success of data mining Exploits the long tail – How Into Thin Air made Touching the Void popular

The Long Tail Source: Chris Anderson (2004)

Physical vs. Online J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 4 Read to learn more!

Utility (Preference) Matrix Harry Potter 1 Harry Potter 2 Harry Potter 3 TwilightStar Wars 1 Star Wars 2 Star Wars 3 A451 B554 C245 D33 How can we fill the empty entries of the matrix?

Recommendation Systems Content-based: – Represent the items into a feature space and recommend items to customer C similar to previous items rated highly by C Movie recommendations: recommend movies with same actor(s), director, genre, … Websites, blogs, news: recommend other sites with “similar” content

Content-based prediction Harry Potter 1 Harry Potter 2 Harry Potter 3 TwilightStar Wars 1 Star Wars 2 Star Wars 3 A451 B554 C245 D33 Someone who likes one of the Harry Potter (or Star Wars) movies is likely to like the rest Same actors, similar story, same genre

Intuition likes Item profiles RedCirclesTriangles User profile match recommend build

Extracting features for Items Map items into a feature space: – For movies: Actors, directors, genre, rating, year,… – For documents? Items are now real vectors in a multidimensional feature space – Challenge: Make all feature values compatible Alternatively we can view a movie as a set of features: – Star Wars = {1977, Action, Sci-Fi, Lucas, H.Ford} YearActionSci-FiRomanceLucasH. FordPacino Star Wars

Extracting Features for Users To compare items with users we need to map users to the same feature space. How? – Take all the movies that the user has seen and take the average vector Other aggregation functions are also possible. Recommend to user C the most similar item i computing similarity in the common feature space – How do we measure similarity?

Similarity XY

Classification approach Using the user and item features we can construct a classifier that tries to predict if a user will like a new item

Limitations of content-based approach Finding the appropriate features – e.g., images, movies, music Overspecialization – Never recommends items outside user’s content profile – People might have multiple interests

Collaborative filtering Harry Potter 1 Harry Potter 2 Harry Potter 3 TwilightStar Wars 1 Star Wars 2 Star Wars 3 A451 B554 C245 D33 Two users are similar if they rate the same items in a similar way Recommend to user C, the items liked by many of the most similar users.

User Similarity Harry Potter 1 Harry Potter 2 Harry Potter 3 TwilightStar Wars 1 Star Wars 2 Star Wars 3 A451 B554 C245 D33 Which pair of users do you consider as the most similar? What is the right definition of similarity?

User Similarity Harry Potter 1 Harry Potter 2 Harry Potter 3 TwilightStar Wars 1 Star Wars 2 Star Wars 3 A111 B111 C111 D11 Jaccard Similarity: users are sets of movies Disregards the ratings. Jsim(A,B) = 1/5 Jsim(A,C) = 1/2 Jsim(B,D) = 1/4

User Similarity Harry Potter 1 Harry Potter 2 Harry Potter 3 TwilightStar Wars 1 Star Wars 2 Star Wars 3 A451 B554 C245 D33 Cosine Similarity: Assumes zero entries are negatives: Cos(A,B) = 0.38 Cos(A,C) = 0.32

User Similarity Harry Potter 1 Harry Potter 2 Harry Potter 3 TwilightStar Wars 1 Star Wars 2 Star Wars 3 A2/35/3-7/3 B1/3 -2/3 C-5/31/34/3 D00 Normalized Cosine Similarity: Subtract the mean rating per user and then compute Cosine (correlation coefficient) Corr(A,B) = Corr(A,C) =

User-User Collaborative Filtering

Item-Item Collaborative Filtering We can transpose (flip) the matrix and perform the same computation as before to define similarity between items – Intuition: Two items are similar if they are rated in the same way by many users. – Better defined similarity since it captures the notion of genre of an item Users may have multiple interests. Algorithm: For each user c and item i – Find the set D of most similar items to item i that have been rated by user c. – Aggregate their ratings to predict the rating for item i. Disadvantage: we need to consider each user-item pair separately

Evaluation

Model-Based Collaborative Filtering So far we have looked at specific user-item combinations A different approach looks at the full user-item matrix and tries to find a model that explains how the ratings of the matrix are generated – If we have such a model, we can predict the ratings that we do not observe. Latent factor model: – (Most) models assume that there are few (K) latent factors that define the behavior of the users and the characteristics of the items

Geared towards females Geared towards males Serious Funny Latent Factor Models 23 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, The Princess Diaries The Lion King Braveheart Lethal Weapon Independence Day Amadeus The Color Purple Dumb and Dumber Ocean’s 11 Sense and Sensibility

Linear algebra

Rank

Rank-1 matrices

Singular Value Decomposition [n×r][r×r][r×m] r: rank of matrix A [n×m] =

Singular Value Decomposition

Symmetric matrices

An (extreme) example A =

SVD – Example: Users-to-Movies A = U  V T - example: Users to Movies J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 31 = SciFi Romance Matrix Alien Serenity Casablanca Amelie  m n U VTVT “Concepts” AKA Latent dimensions AKA Latent factors

SVD – Example: Users-to-Movies A = U  V T - example: Users to Movies J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 32 SciFi-concept Romance-concept = SciFi Romance xx Matrix Alien Serenity Casablanca Amelie

SVD – Example: Users-to-Movies A = U  V T - example: Users to Movies J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 33 SciFi-concept Romance-concept = SciFi Romance xx Matrix Alien Serenity Casablanca Amelie U is “user-to-concept” similarity matrix

SVD – Example: Users-to-Movies A = U  V T - example: Users to Movies J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 34 SciFi-concept Romance-concept = SciFi Romance xx Matrix Alien Serenity Casablanca Amelie V is “movie to concept” similarity matrix

SVD – Example: Users-to-Movies A = U  V T - example: Users to Movies J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 35 SciFi-concept Romance-concept = SciFi Romance xx Matrix Alien Serenity Casablanca Amelie Σ is the “concept strength” matrix

An (more realistic) example User-Movie matrix There are two prototype users and movies but they are noisy A =

SVD – Example: Users-to-Movies A = U  V T - example: Users to Movies J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 37 = SciFi Romnce Matrix Alien Serenity Casablanca Amelie  m n U VTVT “Concepts” AKA Latent dimensions AKA Latent factors A more realistic scenario

SVD – Example: Users-to-Movies A = U  V T - example: Users to Movies J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 38 = SciFi Romance x x Matrix Alien Serenity Casablanca Amelie

SVD – Example: Users-to-Movies A = U  V T - example: Users to Movies J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 39 = SciFi Romance x x Matrix Alien Serenity Casablanca Amelie SciFi-concept Romance-concept The first two vectors are more or less unchanged

SVD – Example: Users-to-Movies A = U  V T - example: Users to Movies J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 40 = SciFi Romance x x Matrix Alien Serenity Casablanca Amelie The third vector has a very low singular value

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 41 SVD - Interpretation

Rank-k approximation In the last User-Movie matrix we have more than two singular vectors, but the strongest ones are still about the two types. – The third models the noise in the data By keeping the two strongest singular vectors we obtain most of the information in the data. – This is a rank-2 approximation of the matrix A – This is the best rank-2 approximation of A The best two latent factors

SVD as an optimization

Example J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 44 x x 

Example J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 45  x x

Example J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 46  Frobenius norm: ǁ M ǁ F =  Σ ij M ij 2 ǁ A-B ǁ F =  Σ ij (A ij -B ij ) 2 is “small”

Latent Factor Models “SVD” on Netflix data: R ≈ Q · P T For now let’s assume we can approximate the rating matrix R as a product of “thin” Q · P T – R has missing entries but let’s ignore that for now! Basically, we will want the reconstruction error to be small on known ratings and we don’t care about the values on the missing ones J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, ≈ users items PTPT Q users R SVD: A = U  V T factors

Ratings as Products of Factors How to estimate the missing rating of user x for item i? J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, items ≈ items users ? PTPT q i = row i of Q p x = column x of P T factors Q

Ratings as Products of Factors How to estimate the missing rating of user x for item i? J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, items ≈ items users ? PTPT factors Q q i = row i of Q p x = column x of P T

Ratings as Products of Factors How to estimate the missing rating of user x for item i? J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, items ≈ items users ? Q PTPT 2.4 f factors q i = row i of Q p x = column x of P T

Geared towards females Geared towards males Serious Funny Latent Factor Models 51 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, The Princess Diaries The Lion King Braveheart Lethal Weapon Independence Day Amadeus The Color Purple Dumb and Dumber Ocean’s 11 Sense and Sensibility Factor 1 Factor 2

Geared towards females Geared towards males Serious Funny Latent Factor Models 52 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, The Princess Diaries The Lion King Braveheart Lethal Weapon Independence Day Amadeus The Color Purple Dumb and Dumber Ocean’s 11 Sense and Sensibility Factor 1 Factor 2

Example Missing ratings J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 53 = x x

Example Reconstruction of missing ratings

Latent Factor Models J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,  PTPT Q users items factors items users

Back to Our Problem Want to minimize SSE for unseen test data Idea: Minimize SSE on training data – Want large k (# of factors) to capture all the signals – But, SSE on test data begins to rise for k > 2 This is a classical example of overfitting: – With too much freedom (too many free parameters) the model starts fitting noise That is it fits too well the training data and thus not generalizing well to unseen test data ?? ? 21? 3 ? 1 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

Dealing with Missing Entries To solve overfitting we introduce regularization: – Allow rich model where there are sufficient data – Shrink aggressively where data are scarce ?? ? 21? 3 ? 1 1, 2 … user set regularization parameters “error” “length” Note: We do not care about the “raw” value of the objective function, but we care in P,Q that achieve the minimum of the objective J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

Geared towards females Geared towards males serious funny The Effect of Regularization 58 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, The Princess Diaries The Lion King Braveheart Lethal Weapon Independence Day Amadeus The Color Purple Dumb and Dumber Ocean’s 11 Sense and Sensibility Factor 1 Factor 2 min factors “error” + “length”

Geared towards females Geared towards males serious funny The Effect of Regularization 59 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, The Lion King Braveheart Lethal Weapon Independence Day Amadeus The Color Purple Dumb and Dumber Ocean’s 11 Sense and Sensibility Factor 1 Factor 2 The Princess Diaries min factors “error” + “length”

Geared towards females Geared towards males serious funny The Effect of Regularization 60 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, The Lion King Braveheart Lethal Weapon Independence Day Amadeus The Color Purple Dumb and Dumber Ocean’s 11 Sense and Sensibility Factor 1 Factor 2 min factors “error” + “length” The Princess Diaries

Geared towards females Geared towards males serious funny The Effect of Regularization 61 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, The Lion King Braveheart Lethal Weapon Independence Day Amadeus The Color Purple Dumb and Dumber Ocean’s 11 Sense and Sensibility Factor 1 Factor 2 The Princess Diaries min factors “error” + “length”

Latent factors To find the P,Q that minimize the error function we can use (stochastic) gradient descent We can define different latent factor models that apply the same idea in different ways – Probabilistic/Generative models. The latent factor methods work well in practice, and they are employed by most sophisticated recommendation systems

Pros and cons of collaborative filtering Works for any kind of item – No feature selection needed Cold-Start problem: – New user problem – New item problem Sparsity of rating matrix – Cluster-based smoothing?

The Netflix Challenge 1M prize to improve the prediction accuracy by 10%

Extensions Group Recommendations Social Recommendations

Group recommendations Suppose that we want to recommend a movie for a group of people. We can first predict the rating for each individual and then aggregate for the group. How do we aggregate? – i.e., how much does the group like the movie?

Aggregation functions

Social Recommendations Suppose that except for the rating matrix, you are also given a social network of all the users – Such social networks exist in sites like Yelp, Foursquare, Epinions, etc How can we use the social network to improve recommendations? – Homophily: connected users are likely to have similar ratings.

Geared towards females Geared towards males Serious Funny Social Recommendations 69 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, The Princess Diaries The Lion King Braveheart Lethal Weapon Independence Day Amadeus The Color Purple Dumb and Dumber Ocean’s 11 Sense and Sensibility We do not have enough information for Dave to predict

Geared towards females Geared towards males Serious Funny Social Recommendations 70 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, The Princess Diaries The Lion King Braveheart Lethal Weapon Independence Day Amadeus The Color Purple Dumb and Dumber Ocean’s 11 Sense and Sensibility But we know that he is good friends with Charlie for whom we know what he likes

Geared towards females Geared towards males Serious Funny Social Recommendations 71 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, The Princess Diaries The Lion King Braveheart Lethal Weapon Independence Day Amadeus The Color Purple Dumb and Dumber Ocean’s 11 Sense and Sensibility This can help us adjust the ratings for Dave to be closer to those of Charlie

Social Regularization

Helps in giving additional information about the preferences of the users Helps with sparse data since it allows us to make inferences for users for whom we have little data. The same idea can be applied in different settings

Example: Linear regression

Linear regression task Example application: we want to predict the popularity of a blogger. – We can create features about the text of the posts, length, frequency, topics, etc. – Using training data we can find the linear function that best predicts the popularity

Linear regression with social regularization Regression CostRegularization Cost This is sometimes also called network smoothing

Collective Classification The same idea can be applied to classification problems: – Classification: create a model that given a set of features predicts a discrete class label E.g. predict what a facebook user will vote in the next election. We can use the postings of the user to train a model that predicts among the existing parties (independent classification) We also have the Facebook social network information: – It is reasonable to assume that people that are connected are more likely to vote the same or similar way – We want to collectively assign labels so that connected nodes are likely to get similar labels

Collective Classification Classification Cost Separation Cost