Netflix Challenge: Combined Collaborative Filtering Greg Nelson Alan Sheinberg.

Slides:



Advertisements
Similar presentations
Recommender Systems & Collaborative Filtering
Advertisements

Google News Personalization: Scalable Online Collaborative Filtering
Google News Personalization Scalable Online Collaborative Filtering
Nonparametric Methods: Nearest Neighbors
Indexing. Efficient Retrieval Documents x terms matrix t 1 t 2... t j... t m nf d 1 w 11 w w 1j... w 1m 1/|d 1 | d 2 w 21 w w 2j... w 2m 1/|d.
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
Distributed Approximate Spectral Clustering for Large- Scale Datasets FEI GAO, WAEL ABD-ALMAGEED, MOHAMED HEFEEDA PRESENTED BY : BITA KAZEMI ZAHRANI 1.
15.8 Algorithms using more than two passes Presented By: Seungbeom Ma (ID 125) Professor: Dr. T. Y. Lin Computer Science Department San Jose State University.
Jeff Howbert Introduction to Machine Learning Winter Collaborative Filtering Nearest Neighbor Approach.
Oct 14, 2014 Lirong Xia Recommender systems acknowledgment: Li Zhang, UCSC.
Ranking models in IR Key idea: We wish to return in order the documents most likely to be useful to the searcher To do this, we want to know which documents.
Recommender Systems Problem formulation Machine Learning.
The Wisdom of the Few A Collaborative Filtering Approach Based on Expert Opinions from the Web Xavier Amatriain Telefonica Research Nuria Oliver Telefonica.
Collaborative Filtering in iCAMP Max Welling Professor of Computer Science & Statistics.
Rubi’s Motivation for CF  Find a PhD problem  Find “real life” PhD problem  Find an interesting PhD problem  Make Money!
Memory-Based Recommender Systems : A Comparative Study Aaron John Mani Srinivasan Ramani CSCI 572 PROJECT RECOMPARATOR.
CS345 Data Mining Recommendation Systems Netflix Challenge Anand Rajaraman, Jeffrey D. Ullman.
Database Implementation Issues CPSC 315 – Programming Studio Spring 2008 Project 1, Lecture 5 Slides adapted from those used by Jennifer Welch.
Sparse Solutions for Large Scale Kernel Machines Taher Dameh CMPT820-Multimedia Systems Dec 2 nd, 2010.
Learning Bit by Bit Collaborative Filtering/Recommendation Systems.
Recommendations via Collaborative Filtering. Recommendations Relevant for movies, restaurants, hotels…. Recommendation Systems is a very hot topic in.
Customizable Bayesian Collaborative Filtering Denver Dash Big Data Reading Group 11/19/2007.
Finding Similar Items. Set Similarity Problem: Find similar sets. Motivation: Many things can be modeled/represented as sets Applications: –Face Recognition.
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
Finding Similar Items.
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Note to other teachers and users of these.
Recommendation Systems
Finding Near Duplicates (Adapted from slides and material from Rajeev Motwani and Jeff Ullman)
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates
Approximation algorithms for large-scale kernel methods Taher Dameh School of Computing Science Simon Fraser University March 29 th, 2010.
Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering, Slovak University of Technology
Item-based Collaborative Filtering Recommendation Algorithms
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
1 Recommender Systems Collaborative Filtering & Content-Based Recommending.
Finding Similar Items 1 Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 10: Finding Similar Items Mining.
Online Learning for Collaborative Filtering
EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 May 21 st, 2009 Nathan.
A Content-Based Approach to Collaborative Filtering Brandon Douthit-Wood CS 470 – Final Presentation.
1 Collaborative Filtering & Content-Based Recommending CS 290N. T. Yang Slides based on R. Mooney at UT Austin.
EigenRank: A ranking oriented approach to collaborative filtering By Nathan N. Liu and Qiang Yang Presented by Zachary 1.
DATA MINING LECTURE 6 Sketching, Min-Hashing, Locality Sensitive Hashing.
Similarity & Recommendation Arjen P. de Vries CWI Scientific Meeting September 27th 2013.
Cosine Similarity Item Based Predictions 77B Recommender Systems.
Collaborative Filtering Zaffar Ahmed
Pearson Correlation Coefficient 77B Recommender Systems.
CS4432: Database Systems II Query Processing- Part 2.
Lecture 5 Instructor: Max Welling Squared Error Matrix Factorization.
CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original.
KNN CF: A Temporal Social Network kNN CF: A Temporal Social Network Neal Lathia, Stephen Hailes, Licia Capra University College London RecSys ’ 08 Advisor:
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
Company LOGO MovieMiner A collaborative filtering system for predicting Netflix user’s movie ratings [ECS289G Data Mining] Team Spelunker: Justin Becker,
Item-Based Collaborative Filtering Recommendation Algorithms Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl GroupLens Research Group/ Army.
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
Netflix Prize: Predicting Ratings. Data mv_00(movieID).txt: 1: (1-2,649,429) (1-5) Over 17,000 movie txt files Over 400,000 userID Two Gigs zipped.
Analysis of massive data sets Prof. dr. sc. Siniša Srbljić Doc. dr. sc. Dejan Škvorc Doc. dr. sc. Ante Đerek Faculty of Electrical Engineering and Computing.
Item-Based Collaborative Filtering Recommendation Algorithms
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Big Data Infrastructure
Recommender Systems & Collaborative Filtering
File Organizations and Indexes
Database Implementation Issues
Collaborative Filtering Nearest Neighbor Approach
Q4 : How does Netflix recommend movies?
CSE 454 Advanced Internet Systems University of Washington
CSE 454 Advanced Internet Systems University of Washington
Google News Personalization: Scalable Online Collaborative Filtering
DATABASE IMPLEMENTATION ISSUES
Recommendation Systems
Database Implementation Issues
Database Implementation Issues
Presentation transcript:

Netflix Challenge: Combined Collaborative Filtering Greg Nelson Alan Sheinberg

Overall Idea Data 480,189 users 17,770 movies 99% sparse Query: how will user u rate movie m? One Idea Find similar users of u and average ratings given by them to movie m (if they exist) weighted by user similarity Another Idea Find similar movies of m and average ratings given by u to those movies (if they exist) weighted by movie similarity Combine Ideas Consider how u and similar users rated movie m and similar movies Take average of existing ratings weighted by product of similarities More pairs to consider will overcome sparsity (need about 100) Normalize rating scales: subtract mean, divide standard deviation

Finding Similar Users/Movies Data-set is huge, can’t just use naïve approach View users as vectors, entry i = 1 if movie i was rated, 0 if not Use Minhash (Jaccard Similarity) to create signatures Use LSH to find similar users Same idea to find similar movies

LSH Implementation Each band requires a disk based hash table Minimize the number of IOs Minimize the number of seeks Batch entries in memory Batch in FIFO order ~Really long time Group by bucket (minimize IO) ~5 hours Group by bucket + sort writes by bucket # (minimize IO and seeks) ~25 minutes Could sort data by bucket #, but would make changing hash functions and LSH parameters a big pain. Also, not much faster than last approach.

Results -- Terminology U := neighborhood of user u M := neighborhood of movie m Support := # existing ratings in U x M Graph RMSE vs. |U x M|

Results -- Support = 5

Results -- Support = 10

Future Ideas Missing values biggest problem Use content-based predictor to fill in the “holes”? We did begin some work on this: Scraped IMDb for genre, director, producer, cast, plot summary for most movies Use classifier instead of just using average in place of missing values Focused mainly on CF and didn’t get far with this