Presentation is loading. Please wait.

Presentation is loading. Please wait.

Netflix Challenge: Combined Collaborative Filtering Greg Nelson Alan Sheinberg.

Similar presentations


Presentation on theme: "Netflix Challenge: Combined Collaborative Filtering Greg Nelson Alan Sheinberg."— Presentation transcript:

1 Netflix Challenge: Combined Collaborative Filtering Greg Nelson Alan Sheinberg

2 Overall Idea Data 480,189 users 17,770 movies 99% sparse Query: how will user u rate movie m? One Idea Find similar users of u and average ratings given by them to movie m (if they exist) weighted by user similarity Another Idea Find similar movies of m and average ratings given by u to those movies (if they exist) weighted by movie similarity Combine Ideas Consider how u and similar users rated movie m and similar movies Take average of existing ratings weighted by product of similarities More pairs to consider will overcome sparsity (need about 100) Normalize rating scales: subtract mean, divide standard deviation

3 Finding Similar Users/Movies Data-set is huge, can’t just use naïve approach View users as vectors, entry i = 1 if movie i was rated, 0 if not Use Minhash (Jaccard Similarity) to create signatures Use LSH to find similar users Same idea to find similar movies

4 LSH Implementation Each band requires a disk based hash table Minimize the number of IOs Minimize the number of seeks Batch entries in memory Batch in FIFO order ~Really long time Group by bucket (minimize IO) ~5 hours Group by bucket + sort writes by bucket # (minimize IO and seeks) ~25 minutes Could sort data by bucket #, but would make changing hash functions and LSH parameters a big pain. Also, not much faster than last approach.

5 Results -- Terminology U := neighborhood of user u M := neighborhood of movie m Support := # existing ratings in U x M Graph RMSE vs. |U x M|

6 Results -- Support = 5

7 Results -- Support = 10

8 Future Ideas Missing values biggest problem Use content-based predictor to fill in the “holes”? We did begin some work on this: Scraped IMDb for genre, director, producer, cast, plot summary for most movies Use classifier instead of just using average in place of missing values Focused mainly on CF and didn’t get far with this


Download ppt "Netflix Challenge: Combined Collaborative Filtering Greg Nelson Alan Sheinberg."

Similar presentations


Ads by Google