Google News Personalization: Scalable Online Collaborative Filtering

Slides:



Advertisements
Similar presentations
Recommender System A Brief Survey.
Advertisements

Recommender Systems & Collaborative Filtering
Google News Personalization: Scalable Online Collaborative Filtering
Google News Personalization Scalable Online Collaborative Filtering
Algorithms of Google News An Analysis of Google News Personalization Scalable Online Collaborative Filtering 1.
Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)
Item Based Collaborative Filtering Recommendation Algorithms
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Center for E-Business Technology Seoul National University Seoul, Korea Socially Filtered Web Search: An approach using social bookmarking tags to personalize.
Jeff Howbert Introduction to Machine Learning Winter Collaborative Filtering Nearest Neighbor Approach.
Sean Blong Presents: 1. What are they…?  “[…] specific type of information filtering (IF) technique that attempts to present information items (movies,
A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown.
Learning to Recommend Hao Ma Supervisors: Prof. Irwin King and Prof. Michael R. Lyu Dept. of Computer Science & Engineering The Chinese University of Hong.
6/2/ An Automatic Personalized Context- Aware Event Notification System for Mobile Users George Lee User Context-based Service Control Group Network.
Memory-Based Recommender Systems : A Comparative Study Aaron John Mani Srinivasan Ramani CSCI 572 PROJECT RECOMPARATOR.
Probability based Recommendation System Course : ECE541 Chetan Tonde Vrajesh Vyas Ashwin Revo Under the guidance of Prof. R. D. Yates.
Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al. Presented by: Tie-Yan.
1 Collaborative Filtering Rong Jin Department of Computer Science and Engineering Michigan State University.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Recommender systems Ram Akella November 26 th 2008.
1 Collaborative Filtering: Latent Variable Model LIU Tengfei Computer Science and Engineering Department April 13, 2011.
Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
Group Recommendations with Rank Aggregation and Collaborative Filtering Linas Baltrunas, Tadas Makcinskas, Francesco Ricci Free University of Bozen-Bolzano.
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Classical Music for Rock Fans?: Novel Recommendations for Expanding User Interests Makoto Nakatsuji, Yasuhiro Fujiwara, Akimichi Tanaka, Toshio Uchiyama,
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Learning Geographical Preferences for Point-of-Interest Recommendation Author(s): Bin Liu Yanjie Fu, Zijun Yao, Hui Xiong [KDD-2013]
RecBench: Benchmarks for Evaluating Performance of Recommender System Architectures Justin Levandoski Michael D. Ekstrand Michael J. Ludwig Ahmed Eldawy.
1 Computing Trust in Social Networks Huy Nguyen Lab seminar April 15, 2011.
Department of Electrical Engineering and Computer Science Kunpeng Zhang, Yu Cheng, Yusheng Xie, Doug Downey, Ankit Agrawal, Alok Choudhary {kzh980,ych133,
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
A Content-Based Approach to Collaborative Filtering Brandon Douthit-Wood CS 470 – Final Presentation.
Evaluation of Recommender Systems Joonseok Lee Georgia Institute of Technology 2011/04/12 1.
EigenRank: A ranking oriented approach to collaborative filtering By Nathan N. Liu and Qiang Yang Presented by Zachary 1.
Recommender Systems Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Credits to Bing Liu (UIC) and Angshul Majumdar.
Recommender Systems. Recommender Systems (RSs) n RSs are software tools providing suggestions for items to be of use to users, such as what items to buy,
Google News Personalization Big Data reading group November 12, 2007 Presented by Babu Pillai.
Post-Ranking query suggestion by diversifying search Chao Wang.
A Latent Social Approach to YouTube Popularity Prediction Amandianeze Nwana Prof. Salman Avestimehr Prof. Tsuhan Chen.
ICONIP 2010, Sydney, Australia 1 An Enhanced Semi-supervised Recommendation Model Based on Green’s Function Dingyan Wang and Irwin King Dept. of Computer.
Personalization Services in CADAL Zhang yin Zhuang Yuting Wu Jiangqin College of Computer Science, Zhejiang University November 19,2006.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
Item-Based Collaborative Filtering Recommendation Algorithms Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl GroupLens Research Group/ Army.
Learning in Bayesian Networks. Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown Structure Incomplete.
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
SimRank: A Measure of Structural-Context Similarity Glen Jeh and Jennifer Widom Stanford University ACM SIGKDD 2002 January 19, 2011 Taikyoung Kim SNU.
Collaborative Deep Learning for Recommender Systems
Collaborative Filtering: Searching and Retrieving Web Information Together Huimin Lu December 2, 2004 INF 385D Fall 2004 Instructor: Don Turnbull.
Collaborative Filtering - Pooja Hegde. The Problem : OVERLOAD Too much stuff!!!! Too many books! Too many journals! Too many movies! Too much content!
ItemBased Collaborative Filtering Recommendation Algorithms 1.
Item-Based Collaborative Filtering Recommendation Algorithms
Collaborative Filtering With Decoupled Models for Preferences and Ratings Rong Jin 1, Luo Si 1, ChengXiang Zhai 2 and Jamie Callan 1 Language Technology.
Recommender Systems & Collaborative Filtering
Methods and Metrics for Cold-Start Recommendations
Machine Learning With Python Sreejith.S Jaganadh.G.
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
Collaborative Filtering Nearest Neighbor Approach
Q4 : How does Netflix recommend movies?
Google News Personalization: Scalable Online Collaborative Filtering
Movie Recommendation System
Probabilistic Latent Preference Analysis
WSExpress: A QoS-Aware Search Engine for Web Services
GhostLink: Latent Network Inference for Influence-aware Recommendation
Presentation transcript:

Google News Personalization: Scalable Online Collaborative Filtering Abhinandan Das, Mayur Datar, Ashutosh Garg @ Google Shyam Rajaram @ UIUC 11 Aug. 2014 SNU IDB Lab. Lee, Inhoe Google News Personalization: Scalable Online Collaborative Filtering

Outline Background Introduction Motivation Method Result Conclusions System Algorithms Result Conclusions

Background Information overflow with the advent of technologies like Internet People are drowning in data pool without getting right information they want Challenge: To find right information Right Information: Something that will answer users’ query Something that user would love to read, listen or see Solution: Search Engines Solve the first requirement What if user does not know what to look for ?

Introduction: Collaborative Filtering It is a technology that aims to learn user preferences and make recommendations based on user and community data Example: Amazon: User’s past shopping history is used to make recommendations for new products Netflix, movie recommender Recommendations for clubs, cosmetics, travel locations Personalized Google News

Motivation Google News is visited by several millions in a period of few days There are lots of articles being created each day Scalability is a big issue for such personalized system The items cannot be static as the articles are changing very fast Existing recommender system thus unsuitable for such need Need for a novel scalable algorithm Moreover, since it is a news based system, the items cannot be static as the articles are changing very fast Novel: 참신한

Google News System Google news record the search queries and clicks on news stories Makes previously read articles easily accessible Recommends top stories based on past click history Recommendations based on: Click history Click history of the community User’s click on an article is treated as positive vote Could be noisy No negative votes

Problem statement Given a click history of N users, And M items U = {u1, u2, u3, u4, u5,…, uN } And M items S = {s1, s2, …, sM } User u with click history set Cu consisting of stories {si1, si2, …, si|Cu| } System is to recommend K stories that user might be interested in Incorporate user feedback instantly U1 이런 거 고치기 즉각적인 User feedback 포함해야 한다

Related Work: Architectures and algorithm Algorithms Memory-based algorithms Predictions made based on past ratings of the user Weighted average of ratings given by other users Weight is the similarity of users ( Pearson correlation coefficient, cosine similarity) Model-based algorithms A model of the user developed based on their past ratings Use the models to predict unseen items (Bayesian, clustering etc. )

Proposed System Mixture of Model based algorithms Probabilistic Latent Semantic Indexing MinHash Memory based algorithms Item co-visitation The scores given by each algorithm is combined as ΣWa Rs where Wa is the weight given to algorithm ‘a’ and Rs is its rank Wa 이런거 고치기 Ras - is the score given by algorithm ‘a’ to story s

Algorithms MinHash A probabilistic clustering method that assigns a pair of users to the same cluster with probability proportional to the overlap between the set of items that these users have voted for User U is represented by a set of items that she has clicked, Cu The similarity between their item-sets is given be : S(ui, uj) = | Cui, ∩ Cuj | (Jaccard Coeffient) | Cui U Cuj | Similarity of a user with all other users can be calculated Not scalable in real time

MinHash: Example User u1 clicks on the items: S1, S2, S5, S6, S9 Similarly, user u2 clicks on the items: S1, S2, S3, S4, S5 Jaccard Coefficient : 3/7

MinHash: Example Cu 이런거 바꿔주기 Minhash는 한 사용자의 구매 기록을 0에서 M사이의 값을 가지는 p개의 hash key의 concatenation으로 바꾸고, 이 값을 특정 사용자의 group ID로 사용합니다.  적용 데이터 형태에 따라 사용해야 할 Minhash 함수가 달라지는데, 0/1 구매기록의 경우에는 h(x)=(ax+b)mod M을 사용합니다. p가 커지면 커질수록 Minhash key로 구한 jaccard 유사도와, 원래의 구매 이력으로 계산한 jaccard 유사도가 비슷해집니다. 하지만 Minhash를 해보면 p값이 작으면 전혀 유사성 없는 사용자들만 모인 그룹이 생기거나, p가 커지면 동일 그룹내에 모이는 사용자가 너무 없기도 한데요, 이걸 보완하기 위해 한 명의 사용자에 대해서 q번 만큼 Minhash key를 생성해 한명의 사용자가 최대 q개의 서로 다른 그룹에 속하게 합니다. 이 때 각 그룹은 어느 정도 확률적으로 서로 구매 이력이 비슷한 사용자들이 모여 있게 되고, 이 그룹에서 다시 각 사용자를 key로 해서, 같은 그룹에 있던 모든 사용자들을 value로 해서 내보내면, 마지막으로 한 명의 사용자에 대해 어느 정도 유사한 구매 이력을 가진 사용자들을 모을 수 있습니다.  이렇게 target 사용자 1명과 이 사용자와 유사한 사용자 k명이 모인 데이터가 있으니, sequential하게 각 사용자와의 유사도를 구하면서 동시에 아이템 별로 가중치도 incremental하게 k명까지 죽 계산하면 해당 target 사용자에 대한 선호도 매트릭스 값이 나옵니다. - 1st Map: 각 사용자별 구매 로그 기반 Minhash key 계산 - 1st Reduce: Group ID별 사용자 모으기 - 2nd Map: 사용자별 grouping을 위해 다시 각 사용자를 key로 해서 emit. - 2nd Reduce: 각 사용자별로 유사 사용자 그룹 모으고, CF 계산

Algorithms Probabilistic Latent Semantic Indexing[PLSI] * With users U and items S, the relationship between users and items is learned by modeling the joint distribution of users and items as a mixture distribution A hidden variable Z is introduced to capture this relationship, which can be thought of as representing user communities(like minded users) and item communities(like items) Mathematically, The conditional probabilities p(z/u) and p(s/z) are learned from the training data using Expectation maximization algorithm C=UZS Z=diagonal matrix * T. Hofmann. Latent Semantic Models for Collaborative Filtering. ACM Transactions on Information Systems, 2004

Algorithms Co-visitation Two stories are clicked by the same user within a certain time interval Store as a graph with nodes at stories, edges as age discounted covisitation counts Update graph (using user history) whenever we receive a click Si 같은 거 고치기 time interval (few hours) “Users who viewed this item also viewed the following items”

Data stored User Table: Story Table: Cluster information (MinHash and PLSI) Click history Story Table: Cluster Statistics: How many times was the story S clicked on by users from each cluster C Co-visitation: How many times was story S co-visited with each story S’

System Components NFE: News Front End NPS: News Personalization Server NSS: News Statistics Server UT: User Table ST: Story Table

Evaluation Train: 80% Test: 20% Data Users Items Clicks(ratings) Movie Lens 943 1,670 54,000 NewsSmall 5,000 40,000 370,000 NewsBig 500,000 190,000 10,000,000 Movie Lens: 943 users, 1670 movies, 54,000 ratings NewsSmall: top 1,000제외한 top 5,000 users, 40,000 items, 370,000 clicks NewsBig: 500,000 users, 190,000 items, 10,000,000 clicks 80%-20% (train to test)

Evaluation Results Movie Lens: 943 users, 1670 movies, 54,000 ratings NewsSmall: top 1,000제외한 top 5,000 users, 40,000 items, 370,000 clicks NewsBig: 500,000 users, 190,000 items, 10,000,000 clicks 80%-20% (train to test) Precision – what fraction of the recommendations were actually clicked in the hold-out or test set) Recall – what fraction of the clicks in the hold-out set were actually recommended PLSI 가 가장 좋네 CORR – Big 은 scalable하지 않아서 못함 ©는 live traffic CS Biased: PLSI, MinHash 에 weight CV Biased: covisitation에 higher weight Popular보다 CS,CV가 38% 좋다

Conclusion and Future Work Algorithms for scalable real time recommendation engines presented The system is content independent and thus easily extendible to other domains As a future work, suitable algorithm can be explored to determine how to combine scores from different algorithms

Analysis The paper has successfully addressed the problem of scalability for large recommender systems Evaluation based on content could be an open research problem It can be argued that instead of only considering user click for clustering similar users, content based clustering of the stories could open up more similarity metrics for the recommendation system The precision lies around 30% for the current system showing that more study needs to be done in the field