M.Sc. Project Doron Harlev Supervisor: Dr. Dana Ron

Slides:



Advertisements
Similar presentations
Item Based Collaborative Filtering Recommendation Algorithms
Advertisements

Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
Collaborative QoS Prediction in Cloud Computing Department of Computer Science & Engineering The Chinese University of Hong Kong Hong Kong, China Rocky.
Jeff Howbert Introduction to Machine Learning Winter Collaborative Filtering Nearest Neighbor Approach.
1 RegionKNN: A Scalable Hybrid Collaborative Filtering Algorithm for Personalized Web Service Recommendation Xi Chen, Xudong Liu, Zicheng Huang, and Hailong.
COLLABORATIVE FILTERING Mustafa Cavdar Neslihan Bulut.
Active Learning and Collaborative Filtering
The Wisdom of the Few A Collaborative Filtering Approach Based on Expert Opinions from the Web Xavier Amatriain Telefonica Research Nuria Oliver Telefonica.
Rubi’s Motivation for CF  Find a PhD problem  Find “real life” PhD problem  Find an interesting PhD problem  Make Money!
Memory-Based Recommender Systems : A Comparative Study Aaron John Mani Srinivasan Ramani CSCI 572 PROJECT RECOMPARATOR.
EigenTaste: A Constant Time Collaborative Filtering Algorithm Ken Goldberg Students: Theresa Roeder, Dhruv Gupta, Chris Perkins Industrial Engineering.
1 Collaborative Filtering and Pagerank in a Network Qiang Yang HKUST Thanks: Sonny Chee.
Towards Implementing Better Movie Recommendation Systems Rahul Thathoo, Zahid Khan Volume of items available for sale increasing rapidly due to low barriers.
Cluster Validation.
Statistical Evaluation of Data
Correlation and Linear Regression
Item-based Collaborative Filtering Recommendation Algorithms
Performance of Recommender Algorithms on Top-N Recommendation Tasks
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
Group Recommendations with Rank Aggregation and Collaborative Filtering Linas Baltrunas, Tadas Makcinskas, Francesco Ricci Free University of Bozen-Bolzano.
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
Classical Music for Rock Fans?: Novel Recommendations for Expanding User Interests Makoto Nakatsuji, Yasuhiro Fujiwara, Akimichi Tanaka, Toshio Uchiyama,
Analyzing and Interpreting Quantitative Data
Linear Regression Least Squares Method: the Meaning of r 2.
1 Recommender Systems Collaborative Filtering & Content-Based Recommending.
1 Social Networks and Collaborative Filtering Qiang Yang HKUST Thanks: Sonny Chee.
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
RecBench: Benchmarks for Evaluating Performance of Recommender System Architectures Justin Levandoski Michael D. Ekstrand Michael J. Ludwig Ahmed Eldawy.
The Effect of Dimensionality Reduction in Recommendation Systems
A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.
A Content-Based Approach to Collaborative Filtering Brandon Douthit-Wood CS 470 – Final Presentation.
Evaluation of Recommender Systems Joonseok Lee Georgia Institute of Technology 2011/04/12 1.
1 Collaborative Filtering & Content-Based Recommending CS 290N. T. Yang Slides based on R. Mooney at UT Austin.
EigenRank: A ranking oriented approach to collaborative filtering By Nathan N. Liu and Qiang Yang Presented by Zachary 1.
Recommender Systems Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Credits to Bing Liu (UIC) and Angshul Majumdar.
Cosine Similarity Item Based Predictions 77B Recommender Systems.
Pearson Correlation Coefficient 77B Recommender Systems.
Chapter 6: Analyzing and Interpreting Quantitative Data
LESSON 6: REGRESSION 2/21/12 EDUC 502: Introduction to Statistics.
ICONIP 2010, Sydney, Australia 1 An Enhanced Semi-supervised Recommendation Model Based on Green’s Function Dingyan Wang and Irwin King Dept. of Computer.
Learning with Green’s Function with Application to Semi-Supervised Learning and Recommender System ----Chris Ding, R. Jin, T. Li and H.D. Simon. A Learning.
Online Evolutionary Collaborative Filtering RECSYS 2010 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University.
Experimental Study on Item-based P-Tree Collaborative Filtering for Netflix Prize.
Item-Based Collaborative Filtering Recommendation Algorithms Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl GroupLens Research Group/ Army.
Introduction Dispersion 1 Central Tendency alone does not explain the observations fully as it does reveal the degree of spread or variability of individual.
Recommender Systems Session F Robin Burke DePaul University Chicago, IL.
Reputation-aware QoS Value Prediction of Web Services Weiwei Qiu, Zhejiang University Zibin Zheng, The Chinese University of HongKong Xinyu Wang, Zhejiang.
ItemBased Collaborative Filtering Recommendation Algorithms 1.
Item-Based Collaborative Filtering Recommendation Algorithms
Collaborative Filtering With Decoupled Models for Preferences and Ratings Rong Jin 1, Luo Si 1, ChengXiang Zhai 2 and Jamie Callan 1 Language Technology.
Theme 6. Linear regression
Data Mining: Concepts and Techniques
CS728 The Collaboration Graph
Recommender Systems Session I
WSRec: A Collaborative Filtering Based Web Service Recommender System
Methods and Metrics for Cold-Start Recommendations
Analyzing and Interpreting Quantitative Data
Asymmetric Correlation Regularized Matrix Factorization for Web Service Recommendation Qi Xie1, Shenglin Zhao2, Zibin Zheng3, Jieming Zhu2 and Michael.
CSE 4705 Artificial Intelligence
Collaborative Filtering Nearest Neighbor Approach
Advanced Artificial Intelligence
Time to CARE: A collaborative engine for practical disease prediction
Least Squares Method: the Meaning of r2
Movie Recommendation System
ITEM BASED COLLABORATIVE FILTERING RECOMMENDATION ALGORITHEMS
Cluster Validity For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall For cluster.
15.1 The Role of Statistics in the Research Process
Presentation transcript:

M.Sc. Project Doron Harlev Supervisor: Dr. Dana Ron Movie Advisor M.Sc. Project Doron Harlev Supervisor: Dr. Dana Ron

Introduction Predict a user’s rating on a scale of 1-5 Prediction based on user’s ratings, peers’ rating and item information Use of MovieLens database Field of recommender systems or collaborative filtering Previous works use Pearson R correlation as distance metric Novel approach, “hybrid-genre”: more efficient performs better than Pearson R on diluted database 11/18/2018 Movie Advisor

MovieLens database Freely available on the Internet 100,000 entries on a scale of 1-5 1682 movies, 943 users, sparseness 6% Mean score 3.53 Mean scores/movie 60 Mean scores/user 106 (min 20) 1 2 3 4 5 0.5 1.5 2.5 3.5 x 10 Score Score Histogram 11/18/2018 Movie Advisor

Data Sets Test data set is 5 entries from 10% of users, a total of 470 entries Base data set is all other entries, a total of 99,530 entries Same random division used throughout presentation Different instances examined at the end 11/18/2018 Movie Advisor

Evaluation Criteria Mean Average Error (MAE) Coverage Calculation shown, where, Si is the actual score and Ri is the predicted score Most widely used in field Coverage Percentage of movies in the test data set that can be predicted MAE usually improves with decrease in coverage In related works, other criteria are shown to produce similar results 11/18/2018 Movie Advisor

Base Algorithms All Average- predict average of all entries Movie Average- predict movie average User Average- predict user average User Movie Average- use both movie and user average as shown in equation Method MAE Coverage [%] All Average 1.046 100 Movie Average 0.887 99.8 User Average 0.849 User Movie Average 0.830 11/18/2018 Movie Advisor

Pearson R Coefficient Pearson R correlation coefficient used as distance metric between users Coefficient calculated as shown in equation. Values in the range ±1, where extremes designate strong correlation -1 -0.5 0.5 1 50 100 150 Pearson R Coefficient for user 104 Typical Histogram 11/18/2018 Movie Advisor

Pearson R Algorithm Predicted score is a weighted sum as shown in equation Only Mutually Related Movies (MRM) taken into account Yields a MAE of 0.79 and coverage of 99.8% for base and test data sets Clearly better than basic methods 10 20 30 40 50 60 70 5 15 25 35 Average Number of Mutually Rated Movies 11/18/2018 Movie Advisor

Pearson R Enhancements MRM threshold suggested by Herlocker MRM threshold modification Correlation threshold suggested by Shardanand shown in equation 11/18/2018 Movie Advisor

Pearson R Performance 11/18/2018 Movie Advisor 0.5 0.6 0.7 0.8 0.9 1 20 40 60 80 100 120 140 0.75 0.8 0.85 0.9 0.95 Users TH=3, Pearson TH=0.1 Herlocker Threshold MAE Pearson R Algorithm User Average 0.4 0.6 1 Coverage 0.5 0.6 0.7 0.8 0.9 1 0.72 0.74 0.76 0.78 0.82 0.84 0.86 0.88 Coverage MAE Users TH=3, Pearson TH=0.10 H/MRM (H/MRM) 2 User Average 11/18/2018 Movie Advisor

Mean Square Difference (MSD) Algorithm Mean Square Difference used as a distance metric Calculation shown in equation Threshold applied as shown Predictions calculated as they were for Pearson R 11/18/2018 Movie Advisor

MSD Performance 11/18/2018 Movie Advisor 0.5 0.6 0.7 0.8 0.9 1 0.75 0.77 0.78 0.79 0.81 0.82 0.83 0.84 0.85 Coverage MAE Users TH=3 MSD Method User Average 0.5 0.6 0.7 0.8 0.9 1 0.75 0.85 0.95 1.05 1.1 Coverage MAE Discarded User Average 11/18/2018 Movie Advisor

Genre Information Database provides genre information. Each entry may have several genres 19 genres exist in the database 11/18/2018 Movie Advisor

Genre Statistics Figures depict number of ratings and average genre score for the entire database 2 4 6 8 10 12 14 16 18 20 30 40 Genre Number Average Number of User Ratings -0.4 -0.2 0.2 0.4 Average Score 11/18/2018 Movie Advisor

Base Genre Algorithm User Genre matrix is average score for each user and genre Base algorithm shown in equation MAE 0.836, coverage 98.7 Better than user average prediction (0.849) 11/18/2018 Movie Advisor

Genre Algorithm Uses matrix G to calculate MSD distance between users 0.5 0.6 0.7 0.8 0.9 1 0.75 0.85 Coverage MAE Users TH=3 Genre Method User Average Uses matrix G to calculate MSD distance between users Pearson R performs poorly on G Threshold and prediction same as MSD Much more efficient since matrix size is reduced (19 instead of 1682, 1682/1990) Results comparable to Pearson R 11/18/2018 Movie Advisor

Hybrid Genre Algorithm Takes into account peers’ ratings as well as user’s genre preferences as shown in equation Performs consistently better than genre algorithm at =0.65 0.5 0.6 0.7 0.8 0.9 1 0.68 0.72 0.74 0.76 0.78 0.82 Coverage MAE Users TH=3, rat=0.65 Hybrid Method Genre Method 11/18/2018 Movie Advisor

Algorithm Comparison All methods on same chart Hybrid genre performs better at low coverage, while Pearson R performs better at high coverage 0.5 0.6 0.7 0.8 0.9 1 0.72 0.74 0.76 0.78 0.82 Coverage MAE Hybrid Genre MSD Pearson R 11/18/2018 Movie Advisor

Database Instantiation Average of 10 base/test divisions shown Coverage of 0.8 is the turning point for performance of Pearson R Vs. hybrid-genre 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 0.72 0.73 0.74 0.76 0.77 Coverage MAE Hybrid-Genre Pearson R 11/18/2018 Movie Advisor

Database Dilution and Instantiation 2/3 of each user’s entries omitted 2 entries from 10% of users used as test data set 10 instantiations averaged Hybrid-genre clearly performs better 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.76 0.77 0.78 0.79 Coverage MAE Hybrid-Genre Pearson R 11/18/2018 Movie Advisor

Conclusions Database, base methods and existing methods presented and analyzed Novel approach, hybrid-genre, explained and compared Hybrid-genre is more efficient, and performs better as sparseness is increased May prove more practical in real-world applications 11/18/2018 Movie Advisor