Presentation is loading. Please wait.

Presentation is loading. Please wait.

Collaborative Filtering with Temporal Dynamics Yehuda Koren Yahoo Research Israel KDD’09.

Similar presentations


Presentation on theme: "Collaborative Filtering with Temporal Dynamics Yehuda Koren Yahoo Research Israel KDD’09."— Presentation transcript:

1 Collaborative Filtering with Temporal Dynamics Yehuda Koren Yahoo Research Israel KDD’09

2 Copyright  2008 by CEBT Outline  Introduction  Temporal Dynamics Global effect Baseline predictors Time changing baseline predictors User, Item effect  Bellkor function  Performance  Exploratory Study  Conclusion 2

3 Copyright  2008 by CEBT The $1 Million Question

4 Copyright  2008 by CEBT Million Dollars Awarded Sept 21 st 2009

5 Copyright  2008 by CEBT Preliminaries  Quiz set and Probe set – Given (Kevin, Avatar, 2009/12/20, ★ ★ ★ ★ ★ ) (Coca, 2012, 2009/12/10, ★ ★ ★ ★ ) – Predict (Kevin, District 9, 2009/12/18, ?????) Training – Dec 31, 1999 – Dec 31, 2005 – 100 million ratings – 480 thousand users – 17,770 movies Testing – 1.4 million ratings

6 Copyright  2008 by CEBT Ratings Data 134 355 455 3 3 222 5 211 3 3 1 17,700 movies 480,000 users

7 Copyright  2008 by CEBT Ratings Data 134 355 455 3 3 2?? ? 21? 3 ? 1 Test Data Set (most recent ratings) 480,000 users 17,700 movies

8 Copyright  2008 by CEBT Training Data 100 million ratings Held-Out Data 3 million ratings 1.5m ratings Quiz Set: scores posted on leaderboard Test Set: scores known only to Netflix Scores used in determining final winner Labels only known to NetflixLabels known publicly

9 Copyright  2008 by CEBT Scoring  Quality of the result is measured by RMSE 1/|R|    u,i)  R ( r ui - r ui ) 2  Does not necessarily correlate well with user satisfaction  Baseline RMSE Scores on Test data 1.054 - just predict the mean user rating for each movie 0.953 - Netflix’s own system (Cinematch) as of 2006 0.941 - nearest-neighbor method using correlation 0.857 - required 10% reduction to win $1 million

10 Copyright  2008 by CEBT Considerations  User preference changes over time  Problem of Concept Drift Instance Selection Instance Weighting  Tries difference exponential time decay rates to solve the problem

11 Copyright  2008 by CEBT Considerations  Full extent of the time period, not only the present behavior Key to being able to extract signal from each time point, while neglecting only the noise  Multiple changing concepts should be captured User or/and item dependency  User-item within a single framework  Do not try to extrapolate future temporal dynamics Too difficult…….

12 Copyright  2008 by CEBT Components of a rating predictor user-movie interactionmovie biasuser bias User-movie interaction  Characterizes the matching between users and movies  Attracts most research in the field Baseline predictor Separates users and movies Often overlooked Benefits from insights into users’ behavior Among the main practical contributions of the competition (slide from Yehuda Koren)

13 Copyright  2008 by CEBT Global temporal effects  Average movie rating made a jump  Ratings increase with the movie age at the time of the rating

14 Copyright  2008 by CEBT Baseline predictors – Rating scale of user u – Values of other ratings user gave (day-specific mood, anchoring, multi-user accounts) – Popularity of movie i – Selection bias; related to number of ratings user gave on the same day (“frequency”)

15 Copyright  2008 by CEBT Time changing baseline predictors

16 Copyright  2008 by CEBT Item temporal effect  Considering resolution and enough rating  Each bin corresponds to roughly ten consecutive weeks of data 30 bins spanning all days in the dataset

17 Copyright  2008 by CEBT User temporal effect

18 Copyright  2008 by CEBT User-Item temporal effect

19 Copyright  2008 by CEBT Periodic effect  Dayparting Some products can be more popular in specific seasons or near certain holidays Different types of television or radio shows are popular throughout different segments of the day  Season, day-of-week effect  Unfortunately, periodic effects do not shows significant predictive power

20 Copyright  2008 by CEBT The bellkor function

21 Copyright  2008 by CEBT Performance  1.054 - just predict the mean user rating for each movie  0.953 - Netflix’s own system (Cinematch) as of 2006  0.941 - nearest-neighbor method using correlation  0.864 - Bellkor algorithm 2008  0.856 - Bellkor algorithm 2009, 10.05% improved

22 Copyright  2008 by CEBT An Exploratory Study  Sudden rise in the average movie rating (early 2004) Technical improvements in Netflix Cinematch GUI improvements Meaning of rating changed ‘Normal User’ increases (?)

23 Copyright  2008 by CEBT An Exploratory Study  Movie’s age Users prefer new movies without any reasons Older movies are just inherently better than newer ones (x)

24 Copyright  2008 by CEBT Conclusion  Tracking the temporal dynamics of user preference is unique challenges  Traditional decay models lose too much signal, thus degrading prediction accuracy  Understanding your data is important, e.g., time-effects  Our model won the contest


Download ppt "Collaborative Filtering with Temporal Dynamics Yehuda Koren Yahoo Research Israel KDD’09."

Similar presentations


Ads by Google