Optimizing Recommender Systems as a Submodular Bandits Problem Yisong Yue Carnegie Mellon University Joint work with Carlos Guestrin & Sue Ann Hong.

Slides:

Advertisements

Similar presentations

A Support Vector Method for Optimizing Average Precision

Advertisements

An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

ICML 2009 Yisong Yue Thorsten Joachims Cornell University

Active Learning with Feedback on Both Features and Instances H. Raghavan, O. Madani and R. Jones Journal of Machine Learning Research 7 (2006) Presented.

Lecture 18: Temporal-Difference Learning

Lirong Xia Reinforcement Learning (2) Tue, March 21, 2014.

Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.

Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.

Modelling Relevance and User Behaviour in Sponsored Search using Click-Data Adarsh Prasad, IIT Delhi Advisors: Dinesh Govindaraj SVN Vishwanathan* Group:

Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.

Rutgers CS440, Fall 2003 Review session. Rutgers CS440, Fall 2003 Topics Final will cover the following topics (after midterm): 1.Uncertainty & introduction.

Cost-effective Outbreak Detection in Networks Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance.

Randomized Sensing in Adversarial Environments Andreas Krause Joint work with Daniel Golovin and Alex Roper International Joint Conference on Artificial.

Beat the Mean Bandit Yisong Yue (CMU) & Thorsten Joachims (Cornell)

L EARNING TO D IVERSIFY USING IMPLICIT FEEDBACK Karthik Raman, Pannaga Shivaswamy & Thorsten Joachims Cornell University 1.

Turning Down the Noise in the Blogosphere Khalid El-Arini, Gaurav Veda, Dafna Shahaf, Carlos Guestrin.

Extraction and Transfer of Knowledge in Reinforcement Learning A.LAZARIC Inria “30 minutes de Science” Seminars SequeL Inria Lille – Nord Europe December.

Maximizing the Spread of Influence through a Social Network

Machine Learning & Data Mining CS/CNS/EE 155 Lecture 17: The Multi-Armed Bandit Problem 1Lecture 17: The Multi-Armed Bandit Problem.

Online Distributed Sensor Selection Daniel Golovin, Matthew Faulkner, Andreas Krause theory and practice collide 1.

Carnegie Mellon Selecting Observations against Adversarial Objectives Andreas Krause Brendan McMahan Carlos Guestrin Anupam Gupta TexPoint fonts used in.

Optimal Design Laboratory | University of Michigan, Ann Arbor 2011 Design Preference Elicitation Using Efficient Global Optimization Yi Ren Panos Y. Papalambros.

Linear Submodular Bandits and their Application to Diversified Retrieval Yisong Yue (CMU) & Carlos Guestrin (CMU) Optimizing Recommender Systems Every.

Mortal Multi-Armed Bandits Deepayan Chakrabarti,Yahoo! Research Ravi Kumar,Yahoo! Research Filip Radlinski, Microsoft Research Eli Upfal,Brown University.

Nisha Ranga TURNING DOWN THE NOISE IN BLOGOSPHERE.

A Utility-Theoretic Approach to Privacy and Personalization Andreas Krause Carnegie Mellon University work performed during an internship at Microsoft.

Near-optimal Nonmyopic Value of Information in Graphical Models Andreas Krause, Carlos Guestrin Computer Science Department Carnegie Mellon University.

Sensor placement applications Monitoring of spatial phenomena Temperature Precipitation... Active learning, Experiment design Precipitation data from Pacific.

INFERRING NETWORKS OF DIFFUSION AND INFLUENCE Presented by Alicia Frame Paper by Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Kraus.

Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.

Special Topic: Missing Values. Missing Values Common in Real Data  Pneumonia: –6.3% of attribute values are missing –one attribute is missing in 61%

9/23. Announcements Homework 1 returned today (Avg 27.8; highest 37) –Homework 2 due Thursday Homework 3 socket to open today Project 1 due Tuesday –A.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Commitment without Regrets: Online Learning in Stackelberg Security Games Nika Haghtalab Carnegie Mellon University Joint work with Maria-Florina Balcan,

Hierarchical Exploration for Accelerating Contextual Bandits Yisong Yue Carnegie Mellon University Joint work with Sue Ann Hong (CMU) & Carlos Guestrin.

1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,

A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA

An Online Auction Framework for Dynamic Resource Provisioning in Cloud Computing Weijie Shi*, Linquan Zhang +, Chuan Wu*, Zongpeng Li +, Francis C.M. Lau*

EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.

Karthik Raman, Pannaga Shivaswamy & Thorsten Joachims Cornell University 1.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Index Interactions in Physical Design Tuning Modeling, Analysis, and Applications Karl Schnaitter, UC Santa Cruz Neoklis Polyzotis, UC Santa Cruz Lise.

Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science ＆ Information Engineering.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.

Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint.

1 Monte-Carlo Planning: Policy Improvement Alan Fern.

Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.

1 1 MPI for Intelligent Systems 2 Stanford University Manuel Gomez Rodriguez 1,2 Bernhard Schölkopf 1 S UBMODULAR I NFERENCE OF D IFFUSION NETWORKS FROM.

哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.

Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.

Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.

Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University Multi-armed Bandit Problems WAIM 2014.

Mining Utility Functions based on user ratings

Monitoring rivers and lakes [IJCAI ‘07]

Near-optimal Observation Selection using Submodular Functions

Adaptive, Personalized Diversity for Visual Discovery

Moran Feldman The Open University of Israel

ISP and Egress Path Selection for Multihomed Networks

Optimizing Submodular Functions

Structured Learning of Two-Level Dynamic Rankings

Overview of Machine Learning

Chapter 2: Evaluative Feedback

Cost-effective Outbreak Detection in Networks

CS 188: Artificial Intelligence Fall 2008

Chapter 2: Evaluative Feedback

Reinforcement Learning (2)

Reinforcement Learning (2)

Logistic Regression Geoff Hulten.

Presentation transcript:

Optimizing Recommender Systems as a Submodular Bandits Problem Yisong Yue Carnegie Mellon University Joint work with Carlos Guestrin & Sue Ann Hong

Optimizing Recommender Systems Must predict what the user finds interesting Receive feedback (training data) “on the fly” 10K articles per day Must Personalize!

Sports Like! Topic# Likes# DisplayedAverage Sports111 Politics00N/A Economy00N/A Celebrity00N/A Day 1

Topic# Likes# DisplayedAverage Sports111 Politics010 Economy00N/A Celebrity00N/A Politics Boo! Day 2

Topic# Likes# DisplayedAverage Sports111 Politics010 Economy111 Celebrity00N/A Day 3 Economy Like!

Topic# Likes# DisplayedAverage Sports120.5 Politics010 Economy111 Celebrity00N/A Day 4 Boo! Sports

Topic# Likes# DisplayedAverage Sports120.5 Politics020 Economy111 Celebrity00N/A Day 5 Boo! Politics

Topic# Likes# DisplayedAverage Sports120.5 Politics020 Economy111 Celebrity00N/A Goal: Maximize total user utility (total # likes) Celebrity Economy Exploit:Explore: How to behave optimally at each round? Sports Best:

Often want to recommend multiple articles at a time!

Making Diversified Recommendations  “Israel implements unilateral Gaza cease-fire :: WRAL.com” “Israel unilaterally halts fire, rockets persist” “Gaza truce, Israeli pullout begin | Latest News” “Hamas announces ceasefire after Israel declares truce - …” “Hamas fighters seek to restore order in Gaza Strip - World - Wire …” “Israel implements unilateral Gaza cease-fire :: WRAL.com” “Obama vows to fight for middle class” “Citigroup plans to cut 4500 jobs” “Google Android market tops 10 billion downloads” “UC astronomers discover two largest black holes ever found”

Outline Optimally diversified recommendations – Minimize redundancy – Maximize information coverage Exploration / exploitation tradeoff – Don’t know user preferences a priori – Only receives feedback for recommendations Incorporating prior knowledge – Reduce the cost of exploration

Choose top 3 documents Individual Relevance:D3 D4 D1 Greedy Coverage Solution:D3 D1 D5

Choose top 3 documents Individual Relevance:D3 D4 D1 Greedy Coverage Solution:D3 D1 D5

Choose top 3 documents Individual Relevance:D3 D4 D1 Greedy Coverage Solution:D3 D1 D5

Choose top 3 documents Individual Relevance:D3 D4 D1 Greedy Coverage Solution:D3 D1 D5

Choose top 3 documents Individual Relevance:D3 D4 D1 Greedy Coverage Solution:D3 D1 D5

Choose top 3 documents Individual Relevance:D3 D4 D1 Greedy Coverage Solution:D3 D1 D5

Choose top 3 documents Individual Relevance:D3 D4 D1 Greedy Coverage Solution:D3 D1 D5 This diminishing returns property is called submodularity

F (A) Submodular Coverage Model F c (A) = how well A “covers” c Diminishing returns: Submodularity Set of articles: A User preferences: w Goal: NP-hard in general Greedy: (1-1/e) guarantee [Nemhauser et al., 1978]

Submodular Coverage Model a1 = “China's Economy Is on the Mend, but Concerns Remain” a2 = “US economy poised to pick up, Geithner says” a3 = “Who's Going To The Super Bowl?” w = [0.6, 0.4] A = Ø

Submodular Coverage Model a1 = “China's Economy Is on the Mend, but Concerns Remain” a2 = “US economy poised to pick up, Geithner says” a3 = “Who's Going To The Super Bowl?” w = [0.6, 0.4] A = Ø F 1 (A+a)-F 1 (A)F 2 (A+a)-F 2 (A) a10.90 a20.80 a300.5 a1a2a3Best Iter a1 Iter 2 Incremental BenefitIncremental Coverage

Submodular Coverage Model a1 = “China's Economy Is on the Mend, but Concerns Remain” a2 = “US economy poised to pick up, Geithner says” a3 = “Who's Going To The Super Bowl?” w = [0.6, 0.4] A = {a1} a1a2a3Best Iter a1 Iter a3 Incremental CoverageIncremental Benefit F 1 (A+a)-F 1 (A)F 2 (A+a)-F 2 (A) a1-- a20.1 (0.8)0 (0) a30 (0)0.5 (0.5)

Example: Probabilistic Coverage Each article a has independent prob. Pr(i|a) of covering topic i. Define F i (A) = 1-Pr(topic i not covered by A) Then F i (A) = 1 – Π(1-P(i|a)) [El-Arini et al., KDD 2009] “noisy or”

Outline Optimally diversified recommendations – Minimize redundancy – Maximize information coverage Exploration / exploitation tradeoff – Don’t know user preferences a priori – Only receives feedback for recommendations Incorporating prior knowledge – Reduce the cost of exploration

Outline Optimally diversified recommendations – Minimize redundancy – Maximize information coverage Exploration / exploitation tradeoff – Don’t know user preferences a priori – Only receives feedback for recommendations Incorporating prior knowledge – Reduce the cost of exploration Submodular information coverage model Diminishing returns property, encourages diversity Parameterized, can fit to user’s preferences Locally linear (will be useful later)

Learning Submodular Coverage Models Submodular functions well-studied – [Nemhauser et al., 1978] Applied to recommender systems – Parameterized submodular functions – [Leskovec et al., 2007; Swaminathan et al., 2009; El-Arini et al., 2009] Learning submodular functions: – [Yue & Joachims, ICML 2008] – [Yue & Guestrin, NIPS 2011] Interactively from user feedback We want to personalize!

Sports Politics World Interactive Personalization # Shown Average Likes : 0

Sports Politics World Average Likes # Shown : 1 Interactive Personalization

Sports Politics World Politics Economy Sports Average Likes # Shown : 1 Interactive Personalization

Sports Politics World Politics Economy Sports Average Likes # Shown : 3 Interactive Personalization

Sports Politics World Politics Economy Sports Politics Economy Politics Average Likes # Shown : 3 Interactive Personalization

Sports Politics World Politics Economy Sports Politics Economy Politics Average Likes # Shown … : 4 Interactive Personalization

Exploration vs Exploitation Average Likes # Shown : 4 Goal: Maximize total user utility Politics Exploit:Explore: Celebrity Economy Politics World Celebrity Best: World Politics World

Linear Submodular Bandits Problem For time t = 1…T – Algorithm recommends articles A t – User scans articles in order and rates them E.g., like or dislike each article (reward) Expected reward is F(A t |w * ) (discussed later) – Algorithm incorporates feedback [Yue & Guestrin, NIPS 2011] Regret: Best possible recommendations

Opportunity cost of not knowing preferences “no-regret” if R(T)/T  0 – Efficiency measured by convergence rate Regret: Time Horizon Linear Submodular Bandits Problem Best possible recommendations [Yue & Guestrin, NIPS 2011]

Local Linearity Incremental Coverage Utility Previous articles Current article User’s preferences

User Model Politics Economy Celebrity a a A A a User scans articles in order Generates feedback y Obeys: Independent of other feedback “Conditional Submodular Independence” [Yue & Guestrin, NIPS 2011]

Estimating User Preferences w w Δ Δ Y Y = Observed Feedback Submodular Coverage Features of Recommendations User [Yue & Guestrin, NIPS 2011] Linear regression to estimate w!

Balancing Exploration vs Exploitation For each slot: Example below: select article on economy Estimated Gain by Topic Uncertainty of Estimate + Uncertainty Estimated gain

Sports Politics World [Yue & Guestrin, NIPS 2011] Balancing Exploration vs Exploitation C(a|A) shrinks as roughly: #times topic was shown

Sports Politics World [Yue & Guestrin, NIPS 2011] Balancing Exploration vs Exploitation C(a|A) shrinks as roughly: #times topic was shown

Sports Politics World Politics Economy Celebrity [Yue & Guestrin, NIPS 2011] Balancing Exploration vs Exploitation C(a|A) shrinks as roughly: #times topic was shown

Sports Politics World Politics Economy Celebrity [Yue & Guestrin, NIPS 2011] Balancing Exploration vs Exploitation C(a|A) shrinks as roughly: #times topic was shown

Sports Politics World Politics Economy Politics Economy Celebrity Sports … [Yue & Guestrin, NIPS 2011] C(a|A) shrinks as roughly: Balancing Exploration vs Exploitation #times topic was shown

LSBGreedy Loop: – Compute least squares estimate – Start with A t empty – For i=1,…,L Recommend article a that maximizes – Receive feedback y t,1,…,y t,L Uncertainty Estimated gain Least Squares Regression

Regret Guarantee – Builds upon linear bandits to submodular setting [Dani et al., 2008; Li et al., 2010; Abbasi-Yadkori et al., 2011] – Leverages conditional submodular independence No-regret algorithm! (regret sub-linear in T) – Regret convergence rate: d/(LT) 1/2 – Optimally balances explore/exploit trade-off [Yue & Guestrin, NIPS 2011] # Topics Time Horizon # Articles per Day

Other Approaches Multiplicative Weighting [El-Arini et al. 2009] – Does not employ exploration – No guarantees (can show doesn’t converge) Ranked bandits [Radlinski et al. 2008; Streeter & Golovin 2008] – Reduction, treats each slot as a separate bandit – Use LinUCB [Dani et al. 2008; Li et al. 2010; Abbasi-Yadkori et al 2011] – Regret guarantee O(dLT 1/2 ) (factor L 1/2 worse) ε-Greedy – Explore with probability ε – Regret guarantee O(d(LT) 2/3 ) (factor (LT) 1/3 worse)

Simulations LSBGreedy RankLinUCB e-Greedy MW

Simulations LSBGreedy RankLinUCB e-Greedy MW

User Study Tens of thousands of real news articles T=10 days L=10 articles per day d=18 topics Users rate articles Count #likes Users heterogeneous Requires personalization

User Study ~27 users in study Submodular Bandits Wins Static Weights Submodular Bandits Wins Ties Losses Multiplicative Updates (no exploration) Submodular Bandits Wins Ties Losses RankLinUCB (doesn’t directly model diversity)

Comparing Learned Weights vs MW MW overfits to “world” topic Few liked articles. MW did not learn anything

Outline Optimally diversified recommendations – Minimize redundancy – Maximize information coverage Exploration / exploitation tradeoff – Don’t know user preferences a priori – Only receives feedback for recommendations Incorporating prior knowledge – Reduce the cost of exploration Submodular information coverage model Diminishing returns property, encourages diversity Parameterized, can fit to user’s preferences Locally linear (will be useful later) Linear Submodular Bandits Problem Characterizes exploration/exploitation Provably near-optimal algorithm User study

The Price of Exploration This is the price of exploration – Region of uncertainty depends linearly on |w * | – Region of uncertainty depends linearly on d – Unavoidable without further assumptions # Topics Time Horizon # Articles per day User’s Preferences

Have: preferences of previous users Goal: learn faster for new users? [Yue, Hong & Guestrin, ICML 2012] Previous Users Observation: Systems do not serve users in a vacuum

Assumption: Users are similar to “stereotypes” Stereotypes described by low dimensional subspace Use SVD-style approach to estimate stereotype subspace E.g., [Argyriou et al., 2007] [Yue, Hong & Guestrin, ICML 2012] Have: preferences of previous users Goal: learn faster for new users?

Suppose w * mostly in subspace – Dimension k << d – “Stereotypical preferences” Two tiered exploration – First in subspace – Then in full space Suppose: w*w* Original Guarantee: [Yue, Hong & Guestrin, ICML 2012] Coarse-to-Fine Bandit Learning 16x Lower Regret!

Coarse-to-Fine Hierarchical Exploration Loop: Least squares in subspace Least squares in full space Start with A t empty For i=1,…,L Recommend article a that maximizes Receive feedback y t,1,…,y t,L Uncertainty in Subspace Uncertainty in Full Space regularized to

Simulation Comparison Naïve (LSBGreedy from before) Reshaped Prior in Full Space (LSBGreedy w/ prior) – Estimated using pre-collected user profiles Subspace (LSBGreedy on the subspace) – Often what people resort to in practice Coarse-to-Fine Approach – Our approach – Combines full space and subspace approaches

Naïve BaselinesReshaped Prior on Full space SubspaceCoarse-to-Fine Approach “Atypical Users” [Yue, Hong, Guestrin, ICML 2012]

User Study Similar setup as before T=10 days L=10 articles per day d=100 topics k=5 (5-dim subspace) (estimated from real users) Tens of thousands of real news articles Users rate articles Count #likes

User Study ~27 users in study Coarse-to-Fine Wins Naïve LSBGreedy Coarse-to-Fine Wins Ties Losses LSBGreedy with Optimal Prior in Full Space

Learning Submodular Functions Parameterized submodular functions – Diminishing returns – Flexible Linear Submodular Bandit Problem – Balance Explore/Exploit – Provably optimal algorithms – Faster convergence using prior knowlege Practical bandit learning approaches Research supported by ONR (PECASE) N and ONR YIP N