Recommendations via Collaborative Filtering. Recommendations Relevant for movies, restaurants, hotels…. Recommendation Systems is a very hot topic in.

Slides:



Advertisements
Similar presentations
Mining customer ratings for product recommendation using the support vector machine and the latent class model William K. Cheung, James T. Kwok, Martin.
Advertisements

Christoph F. Eick Questions and Topics Review Dec. 10, Compare AGNES /Hierarchical clustering with K-means; what are the main differences? 2. K-means.
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
SOFT LARGE MARGIN CLASSIFIERS David Kauchak CS 451 – Fall 2013.
Christoph F. Eick Questions and Topics Review Nov. 22, Assume you have to do feature selection for a classification task. What are the characteristics.
Support Vector Machines
LOGO Classification IV Lecturer: Dr. Bo Yuan
Jeff Howbert Introduction to Machine Learning Winter Collaborative Filtering Nearest Neighbor Approach.
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
Oct 14, 2014 Lirong Xia Recommender systems acknowledgment: Li Zhang, UCSC.
Face Recognition & Biometric Systems Support Vector Machines (part 2)
Recommender Systems Aalap Kohojkar Yang Liu Zhan Shi March 31, 2008.
Item-based Collaborative Filtering Idea: a user is likely to have the same opinion for similar items [if I like Canon cameras, I might also like Canon.
Collaborative Filtering in iCAMP Max Welling Professor of Computer Science & Statistics.
Rubi’s Motivation for CF  Find a PhD problem  Find “real life” PhD problem  Find an interesting PhD problem  Make Money!
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Learning Bit by Bit Collaborative Filtering/Recommendation Systems.
Customizable Bayesian Collaborative Filtering Denver Dash Big Data Reading Group 11/19/2007.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Recommender systems Ram Akella November 26 th 2008.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Radial Basis Function Networks
Clustering Unsupervised learning Generating “classes”
Evaluating Performance for Data Mining Techniques
Classifiers, Part 3 Week 1, Video 5 Classification  There is something you want to predict (“the label”)  The thing you want to predict is categorical.
Chapter 12 (Section 12.4) : Recommender Systems Second edition of the book, coming soon.
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
Data mining and machine learning A brief introduction.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
Universit at Dortmund, LS VIII
User Modeling, Recommender Systems & Personalization Pattie Maes MAS 961- week 6.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Wang-Chien Lee i Pervasive Data Access ( i PDA) Group Pennsylvania State University Mining Social Network Big Data Intelligent.
Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,
Image segmentation Prof. Noah Snavely CS1114
LARGE MARGIN CLASSIFIERS David Kauchak CS 451 – Fall 2013.
Collaborative Data Analysis and Multi-Agent Systems Robert W. Thomas CSCE APR 2013.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Temporal Diversity in Recommender Systems Neal Lathia, Stephen Hailes, Licia Capra, and Xavier Amatriain SIGIR 2010 April 6, 2011 Hyunwoo Kim.
A Content-Based Approach to Collaborative Filtering Brandon Douthit-Wood CS 470 – Final Presentation.
Improving Recommendation Lists Through Topic Diversification CaiNicolas Ziegler, Sean M. McNee,Joseph A. Konstan, Georg Lausen WWW '05 報告人 : 謝順宏 1.
Recommender Systems Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Credits to Bing Liu (UIC) and Angshul Majumdar.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Recommender Systems. Recommender Systems (RSs) n RSs are software tools providing suggestions for items to be of use to users, such as what items to buy,
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Cosine Similarity Item Based Predictions 77B Recommender Systems.
Singular Value Decomposition and Item-Based Collaborative Filtering for Netflix Prize Presentation by Tingda Lu at the Saturday Research meeting 10_23_10.
Google News Personalization Big Data reading group November 12, 2007 Presented by Babu Pillai.
Singular Value Decomposition and Item-Based Collaborative Filtering for Netflix Prize Presentation by Tingda Lu at the Saturday Research meeting 10_23_10.
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
User Modeling and Recommender Systems: recommendation algorithms
6.S093 Visual Recognition through Machine Learning Competition Image by kirkh.deviantart.com Joseph Lim and Aditya Khosla Acknowledgment: Many slides from.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Optimization Indiana University July Geoffrey Fox
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Experimental Study on Item-based P-Tree Collaborative Filtering for Netflix Prize.
Company LOGO MovieMiner A collaborative filtering system for predicting Netflix user’s movie ratings [ECS289G Data Mining] Team Spelunker: Justin Becker,
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Collaborative Filtering - Pooja Hegde. The Problem : OVERLOAD Too much stuff!!!! Too many books! Too many journals! Too many movies! Too much content!
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Machine Learning Clustering: K-means Supervised Learning
Large Margin classifiers
Adopted from Bin UIC Recommender Systems Adopted from Bin UIC.
Collaborative Filtering Nearest Neighbor Approach
Ensembles.
Recommender Systems Group 6 Javier Velasco Anusha Sama
Presentation transcript:

Recommendations via Collaborative Filtering

Recommendations Relevant for movies, restaurants, hotels…. Recommendation Systems is a very hot topic in both academia and industry The idea is to predict the opinion of users Based on prior knowledge

The Netflix example “For only $7.99 a month, instantly watch unlimited movies & TV episodes streaming over the Internet to your TV via an Xbox 360, PS3, Wii or any other device that streams from Netflix. You can also watch instantly on your computer too!”

Where are the recommendations? One of the holy grails of Netflix is a sophisticated system that recommends movies to users The “NetFlix” challenge: – Improve the prediction of the system by 10% –Prize: 1M dollars!

Netflix challenge – Improve RMSE by 10% RMSE

Netflix Real-Life Data ~20000 Movies 2M Users Over 100M Ratings Large-scale…

Techniques Many techniques, algorithms and heuristics The winning algorithm used 107 (!!!!) different algorithmic approaches, blended into a single prediction We will not talk about 107 approaches We will overview some categories

Feature Extraction Represent a movie as a binary vector of features Genre, Language, Actors.. The vector quickly gets pretty big There are methods for compression

Looking for similar vectors Intuition: if I like a movie, I may like movies with similar features What about movies with similar features to the one with similar features? Leads to Grouping movies by similarity of features Also known as clustering

K-means Randomly generate k centers Assign each point to the nearest center, where "nearest" is defined with respect to a distance measure Re-compute the new cluster centers. Repeat the two previous steps until convergence of clusters

Another approach: Classification The idea is to classify all movies = vectors to like \ don’t like For a particular user One popular technique is called Support Vector Machines

Linear SVM Each point (=movie that the user saw) is mapped to 1 (like) or –1 (don’t like) We want to find a (hyper-)plane w*x –b=0 that minimizes the margin between w*x – b =1 (positive), w*x-b= -1 This becomes an optimization problem, good heuristics for solving it

Soft Margin SVM Sometimes there is no hyperplane that can split the “like" and “unlike" cases The Soft Margin method allows some slack for error And still minimizes the distance to the correctly partitioned cases

Disadvantages Vectors may be big Accounts only for “local” preference of each user –Missing a lot of information from other users!

Collaborative Filtering Use information gathered for other users, to infer something about the current user Item-based CF: “Users who bought this book, also liked that book” –C–Can again use similarity between items (users that liked similar books…) User-based CF is a bit more complicated

User Based Collaborative Filtering Analyzes the relationships between users and items (movies) Intuitively you will like movies that similar users like Similar users are defined by those that like similar movies Mutual recursion…

CF

CF Algorithms

User-based N(u;i) – set of users who rate similarly to u and actually rated I R – rating, S- similarity

S u,v Key role! Used for: Selecting N(u;i) Weighting Most popular implementation Pearson correlation coefficient

Pearson correlation coefficient I(u,v) – Set of all items rated by both u and v

Can we do better? We can use external information about the users E.g. by Social networks More ideas?

Privacy issues Note that the methods we presented do not assume knowledge of the user real identities –I–Indeed in the Netflix challenge only masked identities were given Still, to use in general some user profile should be built (even this may be a problem) –A–Avoided in the item-based approach Using external information requires real identities..