Learning with Humans in the Loop Josef Broder, Olivier Chapelle, Geri Gay, Arpita Ghosh, Laura Granka, Thorsten Joachims, Bobby Kleinberg, Madhu Kurup,

Slides:



Advertisements
Similar presentations
A Support Vector Method for Optimizing Average Precision
Advertisements

The K-armed Dueling Bandits Problem
An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.
ICML 2009 Yisong Yue Thorsten Joachims Cornell University
Vote Elicitation with Probabilistic Preference Models: Empirical Estimation and Cost Tradeoffs Tyler Lu and Craig Boutilier University of Toronto.
Accurately Interpreting Clickthrough Data as Implicit Feedback Joachims, Granka, Pan, Hembrooke, Gay Paper Presentation: Vinay Goel 10/27/05.
Evaluating the Robustness of Learning from Implicit Feedback Filip Radlinski Thorsten Joachims Presentation by Dinesh Bhirud
Presenting work by various authors, and own work in collaboration with colleagues at Microsoft and the University of Amsterdam.
Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Optimizing search engines using clickthrough data
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.
Online learning, minimizing regret, and combining expert advice
Beat the Mean Bandit Yisong Yue (CMU) & Thorsten Joachims (Cornell)
L EARNING TO D IVERSIFY USING IMPLICIT FEEDBACK Karthik Raman, Pannaga Shivaswamy & Thorsten Joachims Cornell University 1.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Machine Learning Week 2 Lecture 1.
Optimal Design Laboratory | University of Michigan, Ann Arbor 2011 Design Preference Elicitation Using Efficient Global Optimization Yi Ren Panos Y. Papalambros.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
Evaluating Search Engine
Mortal Multi-Armed Bandits Deepayan Chakrabarti,Yahoo! Research Ravi Kumar,Yahoo! Research Filip Radlinski, Microsoft Research Eli Upfal,Brown University.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.
Online Search Evaluation with Interleaving Filip Radlinski Microsoft.
Online Learning Algorithms
Hierarchical Exploration for Accelerating Contextual Bandits Yisong Yue Carnegie Mellon University Joint work with Sue Ann Hong (CMU) & Carlos Guestrin.
Modern Retrieval Evaluations Hongning Wang
1 Information Filtering & Recommender Systems (Lecture for CS410 Text Info Systems) ChengXiang Zhai Department of Computer Science University of Illinois,
Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.
Some Vignettes from Learning Theory Robert Kleinberg Cornell University Microsoft Faculty Summit, 2009.
Understanding and Predicting Graded Search Satisfaction Tang Yuk Yu 1.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Karthik Raman, Pannaga Shivaswamy & Thorsten Joachims Cornell University 1.
Universit at Dortmund, LS VIII
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Online Learning for Collaborative Filtering
Search Engines that Learn from Implicit Feedback Jiawen, Liu Speech Lab, CSIE National Taiwan Normal University Reference: Search Engines that Learn from.
SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Performance Measures. Why to Conduct Performance Evaluation? 2 n Evaluation is the key to building effective & efficient IR (information retrieval) systems.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Paired Experiments and Interleaving for Retrieval Evaluation Thorsten Joachims, Madhu Kurup, Filip Radlinski Department of Computer Science Department.
Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media J. Bian, Y. Liu, E. Agichtein, and H. Zha ACM WWW, 2008.
Modern Retrieval Evaluations Hongning Wang
NTU & MSRA Ming-Feng Tsai
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
Autumn Web Information retrieval (Web IR) Handout #14: Ranking Based on Click Through data Ali Mohammad Zareh Bidoki ECE Department, Yazd University.
Page 1 CS 546 Machine Learning in NLP Review 1: Supervised Learning, Binary Classifiers Dan Roth Department of Computer Science University of Illinois.
Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.
Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University Multi-armed Bandit Problems WAIM 2014.
Sampath Jayarathna Cal Poly Pomona
Accurately Interpreting Clickthrough Data as Implicit Feedback
Modern Retrieval Evaluations
Evaluation of IR Systems
Tingdan Luo 05/02/2016 Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem Tingdan Luo
Learning to Rank Shubhra kanti karmaker (Santu)
Structured Learning of Two-Level Dynamic Rankings
CSCI B609: “Foundations of Data Science”
How does Clickthrough Data Reflect Retrieval Quality?
Jonathan Elsas LTI Student Research Symposium Sept. 14, 2007
Learning to Rank with Ties
Retrieval Performance Evaluation - Measures
Presentation transcript:

Learning with Humans in the Loop Josef Broder, Olivier Chapelle, Geri Gay, Arpita Ghosh, Laura Granka, Thorsten Joachims, Bobby Kleinberg, Madhu Kurup, Filip Radlinski, Karthik Raman, Tobias Schnabel, Pannaga Shivaswamy, Yisong Yue Department of Computer Science Cornell University

User-Facing Machine Learning Examples – Search Engines – Netflix – Smart Home – Robot Assistant Learning – Gathering and maintenance of knowledge – Measure and optimize performance – Personalization

Interactive Learning System AlgorithmUser x t+1 dependent on y t (e.g. click given ranking, new query) y t dependent on x t (e.g. ranking for query) Utility: U(y t ) Observed Data ≠ Training Data – Observed data is user’s decisions – Even explicit feedback reflects user’s decision process Decisions  Feedback  Learning Algorithm Design! Model! Knowledge constrained Computation constrained

Decide between two Ranking Functions Distribution P(u,q) of users u, queries q Retrieval Function 1 f 1 (u,q)  r 1 Retrieval Function 2 f 2 (u,q)  r 2 Which one is better?  (tj,”SVM”)  1. Kernel Machines 2.SVM-Light Support Vector Machine 3.School of Veterinary Medicine at UPenn 4.An Introduction to Support Vector Machines 5.Service Master Company ⁞ 1.School of Veterinary Medicine at UPenn 2.Service Master Company 3.Support Vector Machine 4.Archives of SUPPORT-VECTOR-MACHINES 5.SVM-Light Support Vector Machine light/ ⁞ U(tj,”SVM”,r 1 ) U(tj,”SVM”,r 2 )

Measuring Utility NameDescriptionAggre- gation Hypothesized Change with Decreased Quality Abandonment Rate% of queries with no clickN/AIncrease Reformulation Rate% of queries that are followed by reformulation N/AIncrease Queries per SessionSession = no interruption of more than 30 minutes MeanIncrease Clicks per QueryNumber of clicksMeanDecrease of queries with clicks at position 1 N/ADecrease Max Reciprocal Rank*1/rank for highest clickMeanDecrease Mean Reciprocal Rank*Mean of 1/rank for all clicks MeanDecrease Time to First Click*Seconds before first clickMedianIncrease Time to Last Click*Seconds before final clickMedianDecrease (*) only queries with at least one click count

ArXiv.org: User Study User Study in ArXiv.org – Natural user and query population – User in natural context, not lab – Live and operational search engine – Ground truth by construction O RIG Â S WAP 2 Â S WAP 4 O RIG : Hand-tuned fielded S WAP 2: O RIG with 2 pairs swapped S WAP 4: O RIG with 4 pairs swapped O RIG Â F LAT Â R AND O RIG : Hand-tuned fielded F LAT : No field weights R AND : Top 10 of F LAT shuffled [Radlinski et al., 2008]

ArXiv.org: Experiment Setup Experiment Setup – Phase I: 36 days Users randomly receive ranking from Orig, Flat, Rand – Phase II: 30 days Users randomly receive ranking from Orig, Swap2, Swap4 – User are permanently assigned to one experimental condition based on IP address and browser. Basic Statistics – ~700 queries per day / ~300 distinct users per day Quality Control and Data Cleaning – Test run for 32 days – Heuristics to identify bots and spammers – All evaluation code was written twice and cross-validated

Arxiv.org: Results Conclusions None of the absolute metrics reflects expected order. Most differences not significant after one month of data. Absolute metrics not suitable for ArXiv-sized search engines. [Radlinski et al., 2008]

Yahoo! Search: Results Retrieval Functions – 4 variants of production retrieval function Data – 10M – 70M queries for each retrieval function – Expert relevance judgments Results – Still not always significant even after more than 10M queries per function – Only consistent with [Chapelle et al., 2012]

How about Explicit Feedback? [Sipos et al., to appear]

Polarity of Answer [Sipos et al., to appear]

Participation [Sipos et al., to appear]

Interactive Learning System AlgorithmUser x t+1 dependent on y t (e.g. click given ranking, new query) y t dependent on x t (e.g. ranking for query) Utility: U(y t ) Observed Data ≠ Training Data – Observed data is user’s decisions – Even explicit feedback reflects user’s decision process Decisions  Feedback  Learning Algorithm – Model the users decision process to extract feedback – Design learning algorithm for this type of feedback Design! Model!

Decide between two Ranking Functions Distribution P(u,q) of users u, queries q Retrieval Function 1 f 1 (u,q)  r 1 Retrieval Function 2 f 2 (u,q)  r 2 Which one is better?  (tj,”SVM”)  1. Kernel Machines 2.SVM-Light Support Vector Machine 3.School of Veterinary Medicine at UPenn 4.An Introduction to Support Vector Machines 5.Service Master Company ⁞ 1.School of Veterinary Medicine at UPenn 2.Service Master Company 3.Support Vector Machine 4.Archives of SUPPORT-VECTOR-MACHINES 5.SVM-Light Support Vector Machine light/ ⁞ U(tj,”SVM”,r 1 ) U(tj,”SVM”,r 2 )

Economic Models of Decision Making Rational Choice – Alternatives Y – Utility function U(y) – Decision y * = argmax y 2Y {U(y)} Bounded Rationality – Time constraints – Computation constraints – Approximate U(y) Behavioral Economics – Framing – Fairness – Loss aversion – Handling uncertainty Click

A Model of how Users Click in Search Model of clicking: – Users explore ranking to position k – Users click on most relevant (looking) links in top k – Users stop clicking when time budget up or other action more promising (e.g. reformulation) – Empirically supported by [Granka et al., 2004] Click

Balanced Interleaving 1. Kernel Machines 2.Support Vector Machine 3.An Introduction to Support Vector Machines 4.Archives of SUPPORT-VECTOR-MACHINES SVM-Light Support Vector Machine light/ 1. Kernel Machines 2.SVM-Light Support Vector Machine light/ 3.Support Vector Machine and Kernel... References 4.Lucent Technologies: SVM demo applet 5.Royal Holloway Support Vector Machine 1. Kernel Machines Support Vector Machine2 3.SVM-Light Support Vector Machine 2 light/ 4.An Introduction to Support Vector Machines3 5.Support Vector Machine and Kernel... References3 6.Archives of SUPPORT-VECTOR-MACHINES Lucent Technologies: SVM demo applet 4 f 1 (u,q)  r 1 f 2 (u,q)  r 2 Interleaving(r 1,r 2 ) (u=tj, q=“svm”) Interpretation: ( r 1 Â r 2 ) ↔ clicks(topk(r 1 )) > clicks(topk(r 2 ))  see also [Radlinski, Craswell, 2012] [Hofmann, 2012] Invariant: For all k, top k of balanced interleaving is union of top k 1 of r 1 and top k 2 of r 2 with k 1 =k 2 ± 1. [Joachims, 2001] [Radlinski et al., 2008] Model of User: Better retrieval functions is more likely to get more clicks.

Arxiv.org: Interleaving Experiment Experiment Setup – Phase I: 36 days Balanced Interleaving of (Orig,Flat) (Flat,Rand) (Orig,Rand) – Phase II: 30 days Balanced Interleaving of (Orig,Swap2) (Swap2,Swap4) (Orig,Swap4) Quality Control and Data Cleaning – Same as for absolute metrics

Arxiv.org: Interleaving Results % wins O RIG % wins R AND Percent Wins Conclusions All interleaving experiments reflect the expected order. All differences are significant after one month of data. Same results also for alternative data-preprocessing.

Yahoo and Bing: Interleaving Results Yahoo Web Search [Chapelle et al., 2012] – Four retrieval functions (i.e. 6 paired comparisons) – Balanced Interleaving  All paired comparisons consistent with ordering by NDCG. Bing Web Search [Radlinski & Craswell, 2010] – Five retrieval function pairs – Team-Game Interleaving  Consistent with ordering by NDGC when NDCG significant.

Efficiency: Interleaving vs. Explicit Bing Web Search – 4 retrieval function pairs – ~12k manually judged queries – ~200k interleaved queries Experiment – p = probability that NDCG is correct on subsample of size y – x = number of queries needed to reach same p-value with interleaving  Ten interleaved queries are equivalent to one manually judged query. [Radlinski & Craswell, 2010]

Interactive Learning System AlgorithmUser x t+1 dependent on y t (e.g. click given ranking, new query) y t dependent on x t (e.g. ranking for query) Utility: U(y t ) Observed Data ≠ Training Data Decisions  Feedback  Learning Algorithm – Model the users decision process to extract feedback  Pairwise comparison test P( y i  y j | U(y i )>U(y j ) ) – Design learning algorithm for this type of feedback Design! Model!

Learning on Operational System Example: – 4 retrieval functions: A > B >> C > D – 10 possible pairs for interactive experiment (A,B)  low cost to user (A,C)  medium cost to user (C,D)  high cost to user (A,A)  zero cost to user … Minimizing Regret – Algorithm gets to decide on the sequence of pairwise tests – Don’t present “bad” pairs more often than necessary – Trade off (long term) informativeness and (short term) cost  Dueling Bandits Problem [Yue, Broder, Kleinberg, Joachims, 2010]

Regret for Dueling Bandits Given: – A finite set H of candidate retrieval functions f 1 …f K – A pairwise comparison test f  f’ on H with P(f  f’) Regret: – R(A) =  t=1..T [P(f*  f t ) + P(f*  f t ’) - 1] – f*: best retrieval function in hindsight (assume single f* exists) – (f,f’): retrieval functions tested at time t Example: Time Step:t 1 t 2 …T Comparison:(f 9,f 12 )  f 9 (f 5,f 9 )  f 5 (f 1,f 3 )  f 3 Regret: P(f*  f 9 )+P(f*  f 12 )-1 P(f*  f 5 )+P(f*  f 9 )-1 = = 0.85= 0.78= 0.01 [Yue, Broder, Kleinberg, Joachims, 2010]

First Thought: Tournament Noisy Sorting/Max Algorithms: – [Feige et al.]: Triangle Tournament Heap O(n/  2 log(1/  )) with prob 1-  – [Adler et al., Karp & Kleinberg]: optimal under weaker assumptions X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X7X7 X5X5 X3X3 X1X1 X5X5 X1X1 X1X1

Algorithm: Interleaved Filter 2 Algorithm InterleavedFilter1(T,W={f 1 …f K }) Pick random f’ from W  =1/(TK 2 ) WHILE |W|>1 – FOR b 2 W DO » duel(f’,f) » update P f – t=t+1 – c t =(log(1/  )/ t) 0.5 – Remove all f from W with P f < 0.5-c t [ WORSE WITH PROB 1-  ] – IF there exists f’’ with P f’’ > 0.5+c t [ BETTER WITH PROB 1-  ] » Remove f’ from W » Remove all f from W that are empirically inferior to f’ » f‘=f’’; t=0 UNTIL T: duel(f’,f’) [Yue et al., 2009] f1f1 f2f2 f’=f 3 f4f4 f5f5 0/0 f1f1 f2f2 f’=f 3 f4f4 f5f5 8/27/34/61/9 f1f1 f2f2 f’=f 3 f4f4 13/211/47/8XX f’=f 1 f2f2 f4f4 0/0 XX Related Algorithms: [Hofmann, Whiteson, Rijke, 2011] [Yue, Joachims, 2009] [Yue, Joachims, 2011]

Assumptions Preference Relation: f i  f j  P(f i  f j ) = 0.5+  i,j > 0.5 Weak Stochastic Transitivity: f i  f j and f j  f k  f i  f k f 1  f 2  f 3  f 4  f 5  f 6  …  f K Strong Stochastic Transitivity:  i,k ≥ max{  i,j,  j,k }  1,4 ≥  2,4 ≥  3,4 ≥ 0.5 ≥  5,4 ≥  6,4 ≥ … ≥  K,4 Stochastic Triangle Inequality: f i  f j  f k   i,k ≤  i,j +  j,k  1,2 = 0.01 and  2,3 = 0.01   1,3 ≤ 0.02  -Winner exists:  = max i { P(f 1  f i )-0.5 } =  1,2 > 0 Theorem: IF2 incurs expected average regret bounded by

Proof Sketch 1.Number of duels per match between f i and f j Correct winner after O(1/  ij log(TK)) comparisons with probability 1-1/(TK 2 )  Hoeffding bound  Union bound ensures overall winner with prob 1-1/T 2.Overall regret E[R] · (1-1/T) E[R(IF2)] + 1/T O(T) = O(E[R(IF2)]) 3.Regret per match between f i and f j Upper bounded by gap between f 1 and f 2 : O(1/  12 log(T))  Strong Stochastic Transitivity  Stochastic Triangle Inequality 4.Number of matches When IF2 mistake free, then O(K) matches in expectation.  Random Walk of b’

Lower Bound Theorem: Any algorithm for the dueling bandits problem has regret Proof: [Karp, Kleinberg, 2007] [Kleinberg et al., 2007] Intuition: – Magically guess the best bandit, just verify guess – Worst case: 8 f i  f j : P(f i  f j )=0.5+  – Need O(1/  2 log T) duels to get 1-1/T confidence. Theorem: Any algorithm for the dueling bandits problem has regret

Convex Dueling Bandits Problem Given: – A infinite set H of functions f w parameterized by w 2 F ½ < d – A convex value function v(w) describing utility of f w – P(f w  f w’ ) =  (v(w)-v(w’)) = 0.5+  (w,w’) – P is L-Lipschitz Regret: – R(A) =  t=1..T [P(f*  f t ) + P(f*  f t ’) - 1] – f*: best retrieval function in hindsight – (f,f’): retrieval functions tested at time t Approach: – Stochastic estimated gradient method [Zinkevich 03] [Flaxman et al. 04] – Regret [with Yisong Yue]

δ – explore step size γ – exploit step size Current point Losing candidate Winning candidate Dueling Bandit Gradient Descent

δ – explore step size γ – exploit step size Current point Losing candidate Winning candidate Dueling Bandit Gradient Descent

δ – explore step size γ – exploit step size Current point Losing candidate Winning candidate Dueling Bandit Gradient Descent

δ – explore step size γ – exploit step size Current point Losing candidate Winning candidate Dueling Bandit Gradient Descent

δ – explore step size γ – exploit step size Current point Losing candidate Winning candidate Dueling Bandit Gradient Descent

δ – explore step size γ – exploit step size Current point Losing candidate Winning candidate Dueling Bandit Gradient Descent

δ – explore step size γ – exploit step size Current point Losing candidate Winning candidate Dueling Bandit Gradient Descent

δ – explore step size γ – exploit step size Current point Losing candidate Winning candidate Dueling Bandit Gradient Descent

δ – explore step size γ – exploit step size Current point Losing candidate Winning candidate Dueling Bandit Gradient Descent

Leveraged web search dataset 1000 Queries, 367 Dimensions (Features) Simulate “users” issuing queries

Interactive Learning System AlgorithmUser x t+1 dependent on y t (e.g. click given ranking, new query) y t dependent on x t (e.g. ranking for query) Utility: U(y t ) Observed Data ≠ Training Data Decisions  Feedback  Learning Algorithm – Model the users decision process to extract feedback  Pairwise comparison test P( y i  y j | U(y i )>U(y j ) ) – Design learning algorithm for this type of feedback  Dueling Bandits problem and algorithms (e.g. IF2) Design! Model!

Who does the exploring? Example 1

Who does the exploring? Example 2 Click

Who does the exploring? Example 3 Click

Coactive Feedback Model Interaction: given x Set of all y for context x User explored Algorithm prediction Improved Prediction Feedback: – Improved prediction ӯ t U(ӯ t |x t ) > U(y t |x t ) – Supervised learning: optimal prediction y t * y t * = argmax y U(y|x t )

User Study: Search Engine Study – 16 undergrads, familiar with Google Task – find answers to 10 given questions Data – queries, clicks, logs, manual judgments Results 46  Preferences from clicks are highly accurate.

Viewing vs. Clicking [Joachims et al. 2005, 2007]

Presentation Bias Hypothesis: Order of presentation influences where users look, but not where they click! => Users appear to have trust in Google’s ability to rank the most relevant result first. More relevant Normal: Google’s order of results Swapped: Order of top 2 results swapped [Joachims et al. 2005, 2007]

Quality-of-Context Bias Hypothesis: Clicking depends only on the result itself, butnot on other results. Rank of clicked link as sorted by relevance judges Normal + Swapped2.67 Reversed3.27 => Users click on less relevant results, if they are embedded between irrelevant results. Reversed: Top 10 results in reversed order. [Joachims et al. 2005, 2007]

Which Results are Viewed Before Click? Clicked Link => Users typically do not look at lower results before they click (except maybe the next result).

Machine Translation We propose Coactive Learning as a model of interaction between a learning system and a human user, where both have the common goal of providing results of maximum utility to the user. Wir schlagen vor, koaktive Learning als ein Modell der Wechselwirkung zwischen einem Lernsystem und menschlichen Benutzer, wobei sowohl die gemeinsame Ziel, die Ergebnisse der maximalen Nutzen für den Benutzer. Wir schlagen vor, koaktive Learning als ein Modell der Wechselwirkung des Dialogs zwischen einem Lernsystem und menschlichen Benutzer, wobei sowohl die beide das gemeinsame Ziel haben, die Ergebnisse der maximalen Nutzen für den Benutzer zu liefern. Á xtxt ytyt ӯtӯt

Coactive Learning Model Unknown Utility Function: U(y|x) – Boundedly rational user Algorithm/User Interaction: – LOOP FOREVER Observe context x (e.g. query) Learning algorithm presents y (e.g. ranking) User returns ӯ with U(ӯ|x) > U(y|x) Regret = Regret + [ U(y*|x) – U(y|x) ] Relationship to other online learning models – Expert setting: receive U(y|x) for all y – Bandit setting: receive U(y|x) only for selected y – Dueling bandits: for selected y and ӯ, receive U(ӯ|x) > U(y|x) – Coactive setting: for selected y, receive ӯ with U(ӯ|x) > U(y|x) Optimal prediction y*=argmax y { U(x,y) } Loss for prediction ŷ Never revealed: cardinal feedback optimal y * Never revealed: cardinal feedback optimal y *

Preference Perceptron: Algorithm Model – Linear model of user utility: U(y|x) = w T Á (x,y) Algorithm Set w 1 = 0 FOR t = 1 TO T DO – Observe x t – Present y t = argmax y { w t T  (x t,y) } – Obtain feedback ӯ t – Update w t+1 = w t +  (x t,ӯ t ) -  (x t,y t ) This may look similar to a multi-class Perceptron, but – Feedback ӯ t is different (not get the correct class label) – Regret is different (misclassifications vs. utility difference) [Shivaswamy, Joachims, 2012]

α-Informative Feedback Definition: Strict ® -Informative Feedback Definition: ® -Informative Feedback Presented Slacks both pos/neg Optimal Feedback » Slack Feedback ≥ Presented + α (Best – Presented) [Shivaswamy, Joachims, 2012]

Preference Perceptron: Regret Bound Assumption – U(y|x) = w T ɸ(x,y), but w is unknown Theorem For user feedback ӯ that is α-informative, the average regret of the Preference Perceptron is bounded by Other Algorithms and Results – Feedback that is α-informative only in expectation – General convex loss functions of U(y*|x)-U(ŷ|x) – Regret that scales log(T)/T instead of T -0.5 for strongly convex noise  zero [Shivaswamy, Joachims, 2012]

Proof: Regret Bound Definition ® -informative [Shivaswamy, Joachims, 2012]

Expected α-Informative Feedback Definition: Expected ® -Informative Feedback Theorem: Coactive Pref Perceptron achieves Presented Optimal [Shivaswamy, Joachims, 2012]

Lower Bound Theorem: For any coactive learning algorithm A with linear utility, there exist x t, objects Y, and w such that REG T of A in T steps is  (1/T 0.5 ). [Shivaswamy, Joachims, 2012]

Experiments Web-search ranking: structured prediction – present a ranked list, feedback is ranked list – Yahoo! Learning to Rank dataset (30k queries) – Joint feature map Movie recommendation: item prediction – present a movie, feedback is another movie – MovieLens dataset (1M ratings) – Joint feature map based on “SVD features” Ground truth: best least-squares fit to ratings [Shivaswamy, Joachims, 2012]

Strong vs Weak Feedback Web: Click on highest utility docs until strictly ® -informative Movie: Lowest utility movie that is strictly ® -informative

Noisy Feedback Web: Click on the 5 most relevant docs in top 10 results Movie: Some movie with rating that is higher by one star

Preference Perceptron: Experiment Experiment: Automatically optimize Arxiv.org Fulltext Search Model Utility of ranking y for query x: U t (y|x) =   ° i w t T Á (x,y (i) ) [~1000 features]  Computing argmax ranking: sort by w t T Á (x,y (i) ) Feedback Construct ӯ t from y t by moving clicked links one position higher. Baseline Handtuned w base for U base (y|x) Evaluation Interleaving of ranking from U t (y|x) and U base (y|x) [Raman et al., 2013] Cumulative Win Ratio Number of Feedback Baseline Coactive Learning Analogous to DCG

Related Models Ordinal Regression (Crammer & Singer 2001) – Examples:, r i is numeric rank Pair Preference Learning (Herbrich et al., 1999); Freund et al. 2003) – Examples: – i.i.d. assumption, batch Ranking (Joachims, 2002; Liu 2009) – Examples:, y i * is optimal ranking – Structured Prediction, list- wise ranking Expert Model – Cardinal feedback for all arms / optimal y i * Bandit Model – Cardinal feedback only for chosen arm Dueling Bandit Model (Yue et al. 2009; Yue, Joachims 2009) – Preference feedback between two arms chosen by algorithm

Summary and Conclusions AlgorithmUser x t+1 dependent on y t (e.g. click given ranking, new query) y t dependent on x t (e.g. ranking for query) Utility: U(y t ) Observed Data ≠ Training Data Decisions  Feedback  Learning Algorithm – Dueling Bandits  Model: Pairwise comparison test P( y i  y j | U(y i )>U(y j ) )  Algorithm: Interleaved Filter 2, O(|Y|log(T)) regret – Coactive Learning  Model: for given y, user provides ӯ with U(ӯ|x) > U(y|x)  Algorithm: Preference Perceptron, O(ǁwǁ T 0.5 ) regret Utility model Decision model Actions y t / Experiment FeedbackExplorationRegret Dueling Bandits OrdinalNoisy rational Comparison pairs Noisy comparison AlgorithmLost comparisons Coactive Learning LinearBounded rational Structured object α-informa- tive ӯ UserCardinal utility ⁞⁞⁞⁞⁞⁞⁞ Design! Model!

Research in Interactive Learning Behavioral Studies Decision Model Learning Algorithm Related Fields: Micro Economics Decision Theory Econometrics Psychology Communications Cognitive Science Design Space: Decision Model Utility Model Interaction Experiments Feedback Type Regret Applications