Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning with Humans in the Loop Josef Broder, Olivier Chapelle, Geri Gay, Arpita Ghosh, Laura Granka, Thorsten Joachims, Bobby Kleinberg, Madhu Kurup,

Similar presentations


Presentation on theme: "Learning with Humans in the Loop Josef Broder, Olivier Chapelle, Geri Gay, Arpita Ghosh, Laura Granka, Thorsten Joachims, Bobby Kleinberg, Madhu Kurup,"— Presentation transcript:

1 Learning with Humans in the Loop Josef Broder, Olivier Chapelle, Geri Gay, Arpita Ghosh, Laura Granka, Thorsten Joachims, Bobby Kleinberg, Madhu Kurup, Filip Radlinski, Karthik Raman, Tobias Schnabel, Pannaga Shivaswamy, Yisong Yue Department of Computer Science Cornell University

2 User-Facing Machine Learning Examples – Search Engines – Netflix – Smart Home – Robot Assistant Learning – Gathering and maintenance of knowledge – Measure and optimize performance – Personalization

3 Interactive Learning System AlgorithmUser x t+1 dependent on y t (e.g. click given ranking, new query) y t dependent on x t (e.g. ranking for query) Utility: U(y t ) Observed Data ≠ Training Data – Observed data is user’s decisions – Even explicit feedback reflects user’s decision process Decisions  Feedback  Learning Algorithm Design! Model! Knowledge constrained Computation constrained

4 Decide between two Ranking Functions Distribution P(u,q) of users u, queries q Retrieval Function 1 f 1 (u,q)  r 1 Retrieval Function 2 f 2 (u,q)  r 2 Which one is better?  (tj,”SVM”)  1. Kernel Machines http://svm.first.gmd.de/ 2.SVM-Light Support Vector Machine http://svmlight.joachims.org/ 3.School of Veterinary Medicine at UPenn http://www.vet.upenn.edu/ 4.An Introduction to Support Vector Machines http://www.support-vector.net/ 5.Service Master Company http://www.servicemaster.com/ ⁞ 1.School of Veterinary Medicine at UPenn http://www.vet.upenn.edu/ 2.Service Master Company http://www.servicemaster.com/ 3.Support Vector Machine http://jbolivar.freeservers.com/ 4.Archives of SUPPORT-VECTOR-MACHINES http://www.jiscmail.ac.uk/lists/SUPPORT... 5.SVM-Light Support Vector Machine http://ais.gmd.de/~thorsten/svm light/ ⁞ U(tj,”SVM”,r 1 ) U(tj,”SVM”,r 2 )

5 Measuring Utility NameDescriptionAggre- gation Hypothesized Change with Decreased Quality Abandonment Rate% of queries with no clickN/AIncrease Reformulation Rate% of queries that are followed by reformulation N/AIncrease Queries per SessionSession = no interruption of more than 30 minutes MeanIncrease Clicks per QueryNumber of clicksMeanDecrease Click@1% of queries with clicks at position 1 N/ADecrease Max Reciprocal Rank*1/rank for highest clickMeanDecrease Mean Reciprocal Rank*Mean of 1/rank for all clicks MeanDecrease Time to First Click*Seconds before first clickMedianIncrease Time to Last Click*Seconds before final clickMedianDecrease (*) only queries with at least one click count

6 ArXiv.org: User Study User Study in ArXiv.org – Natural user and query population – User in natural context, not lab – Live and operational search engine – Ground truth by construction O RIG Â S WAP 2 Â S WAP 4 O RIG : Hand-tuned fielded S WAP 2: O RIG with 2 pairs swapped S WAP 4: O RIG with 4 pairs swapped O RIG Â F LAT Â R AND O RIG : Hand-tuned fielded F LAT : No field weights R AND : Top 10 of F LAT shuffled [Radlinski et al., 2008]

7 ArXiv.org: Experiment Setup Experiment Setup – Phase I: 36 days Users randomly receive ranking from Orig, Flat, Rand – Phase II: 30 days Users randomly receive ranking from Orig, Swap2, Swap4 – User are permanently assigned to one experimental condition based on IP address and browser. Basic Statistics – ~700 queries per day / ~300 distinct users per day Quality Control and Data Cleaning – Test run for 32 days – Heuristics to identify bots and spammers – All evaluation code was written twice and cross-validated

8 Arxiv.org: Results Conclusions None of the absolute metrics reflects expected order. Most differences not significant after one month of data. Absolute metrics not suitable for ArXiv-sized search engines. [Radlinski et al., 2008]

9 Yahoo! Search: Results Retrieval Functions – 4 variants of production retrieval function Data – 10M – 70M queries for each retrieval function – Expert relevance judgments Results – Still not always significant even after more than 10M queries per function – Only Click@1 consistent with DCG@5. [Chapelle et al., 2012]

10 How about Explicit Feedback? [Sipos et al., to appear]

11 Polarity of Answer [Sipos et al., to appear]

12 Participation [Sipos et al., to appear]

13 Interactive Learning System AlgorithmUser x t+1 dependent on y t (e.g. click given ranking, new query) y t dependent on x t (e.g. ranking for query) Utility: U(y t ) Observed Data ≠ Training Data – Observed data is user’s decisions – Even explicit feedback reflects user’s decision process Decisions  Feedback  Learning Algorithm – Model the users decision process to extract feedback – Design learning algorithm for this type of feedback Design! Model!

14 Decide between two Ranking Functions Distribution P(u,q) of users u, queries q Retrieval Function 1 f 1 (u,q)  r 1 Retrieval Function 2 f 2 (u,q)  r 2 Which one is better?  (tj,”SVM”)  1. Kernel Machines http://svm.first.gmd.de/ 2.SVM-Light Support Vector Machine http://svmlight.joachims.org/ 3.School of Veterinary Medicine at UPenn http://www.vet.upenn.edu/ 4.An Introduction to Support Vector Machines http://www.support-vector.net/ 5.Service Master Company http://www.servicemaster.com/ ⁞ 1.School of Veterinary Medicine at UPenn http://www.vet.upenn.edu/ 2.Service Master Company http://www.servicemaster.com/ 3.Support Vector Machine http://jbolivar.freeservers.com/ 4.Archives of SUPPORT-VECTOR-MACHINES http://www.jiscmail.ac.uk/lists/SUPPORT... 5.SVM-Light Support Vector Machine http://ais.gmd.de/~thorsten/svm light/ ⁞ U(tj,”SVM”,r 1 ) U(tj,”SVM”,r 2 )

15 Economic Models of Decision Making Rational Choice – Alternatives Y – Utility function U(y) – Decision y * = argmax y 2Y {U(y)} Bounded Rationality – Time constraints – Computation constraints – Approximate U(y) Behavioral Economics – Framing – Fairness – Loss aversion – Handling uncertainty Click

16 A Model of how Users Click in Search Model of clicking: – Users explore ranking to position k – Users click on most relevant (looking) links in top k – Users stop clicking when time budget up or other action more promising (e.g. reformulation) – Empirically supported by [Granka et al., 2004] Click

17 Balanced Interleaving 1. Kernel Machines http://svm.first.gmd.de/ 2.Support Vector Machine http://jbolivar.freeservers.com/ 3.An Introduction to Support Vector Machines http://www.support-vector.net/ 4.Archives of SUPPORT-VECTOR-MACHINES... http://www.jiscmail.ac.uk/lists/SUPPORT... 5.SVM-Light Support Vector Machine http://ais.gmd.de/~thorsten/svm light/ 1. Kernel Machines http://svm.first.gmd.de/ 2.SVM-Light Support Vector Machine http://ais.gmd.de/~thorsten/svm light/ 3.Support Vector Machine and Kernel... References http://svm.research.bell-labs.com/SVMrefs.html 4.Lucent Technologies: SVM demo applet http://svm.research.bell-labs.com/SVT/SVMsvt.html 5.Royal Holloway Support Vector Machine http://svm.dcs.rhbnc.ac.uk 1. Kernel Machines 1 http://svm.first.gmd.de/ 2.Support Vector Machine2 http://jbolivar.freeservers.com/ 3.SVM-Light Support Vector Machine 2 http://ais.gmd.de/~thorsten/svm light/ 4.An Introduction to Support Vector Machines3 http://www.support-vector.net/ 5.Support Vector Machine and Kernel... References3 http://svm.research.bell-labs.com/SVMrefs.html 6.Archives of SUPPORT-VECTOR-MACHINES...4 http://www.jiscmail.ac.uk/lists/SUPPORT... 7.Lucent Technologies: SVM demo applet 4 http://svm.research.bell-labs.com/SVT/SVMsvt.html f 1 (u,q)  r 1 f 2 (u,q)  r 2 Interleaving(r 1,r 2 ) (u=tj, q=“svm”) Interpretation: ( r 1 Â r 2 ) ↔ clicks(topk(r 1 )) > clicks(topk(r 2 ))  see also [Radlinski, Craswell, 2012] [Hofmann, 2012] Invariant: For all k, top k of balanced interleaving is union of top k 1 of r 1 and top k 2 of r 2 with k 1 =k 2 ± 1. [Joachims, 2001] [Radlinski et al., 2008] Model of User: Better retrieval functions is more likely to get more clicks.

18 Arxiv.org: Interleaving Experiment Experiment Setup – Phase I: 36 days Balanced Interleaving of (Orig,Flat) (Flat,Rand) (Orig,Rand) – Phase II: 30 days Balanced Interleaving of (Orig,Swap2) (Swap2,Swap4) (Orig,Swap4) Quality Control and Data Cleaning – Same as for absolute metrics

19 Arxiv.org: Interleaving Results % wins O RIG % wins R AND Percent Wins Conclusions All interleaving experiments reflect the expected order. All differences are significant after one month of data. Same results also for alternative data-preprocessing.

20 Yahoo and Bing: Interleaving Results Yahoo Web Search [Chapelle et al., 2012] – Four retrieval functions (i.e. 6 paired comparisons) – Balanced Interleaving  All paired comparisons consistent with ordering by NDCG. Bing Web Search [Radlinski & Craswell, 2010] – Five retrieval function pairs – Team-Game Interleaving  Consistent with ordering by NDGC when NDCG significant.

21 Efficiency: Interleaving vs. Explicit Bing Web Search – 4 retrieval function pairs – ~12k manually judged queries – ~200k interleaved queries Experiment – p = probability that NDCG is correct on subsample of size y – x = number of queries needed to reach same p-value with interleaving  Ten interleaved queries are equivalent to one manually judged query. [Radlinski & Craswell, 2010]

22 Interactive Learning System AlgorithmUser x t+1 dependent on y t (e.g. click given ranking, new query) y t dependent on x t (e.g. ranking for query) Utility: U(y t ) Observed Data ≠ Training Data Decisions  Feedback  Learning Algorithm – Model the users decision process to extract feedback  Pairwise comparison test P( y i  y j | U(y i )>U(y j ) ) – Design learning algorithm for this type of feedback Design! Model!

23 Learning on Operational System Example: – 4 retrieval functions: A > B >> C > D – 10 possible pairs for interactive experiment (A,B)  low cost to user (A,C)  medium cost to user (C,D)  high cost to user (A,A)  zero cost to user … Minimizing Regret – Algorithm gets to decide on the sequence of pairwise tests – Don’t present “bad” pairs more often than necessary – Trade off (long term) informativeness and (short term) cost  Dueling Bandits Problem [Yue, Broder, Kleinberg, Joachims, 2010]

24 Regret for Dueling Bandits Given: – A finite set H of candidate retrieval functions f 1 …f K – A pairwise comparison test f  f’ on H with P(f  f’) Regret: – R(A) =  t=1..T [P(f*  f t ) + P(f*  f t ’) - 1] – f*: best retrieval function in hindsight (assume single f* exists) – (f,f’): retrieval functions tested at time t Example: Time Step:t 1 t 2 …T Comparison:(f 9,f 12 )  f 9 (f 5,f 9 )  f 5 (f 1,f 3 )  f 3 Regret: P(f*  f 9 )+P(f*  f 12 )-1 P(f*  f 5 )+P(f*  f 9 )-1 = 0.90+0.95-1 = 0.85= 0.78= 0.01 [Yue, Broder, Kleinberg, Joachims, 2010]

25 First Thought: Tournament Noisy Sorting/Max Algorithms: – [Feige et al.]: Triangle Tournament Heap O(n/  2 log(1/  )) with prob 1-  – [Adler et al., Karp & Kleinberg]: optimal under weaker assumptions X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X7X7 X5X5 X3X3 X1X1 X5X5 X1X1 X1X1

26 Algorithm: Interleaved Filter 2 Algorithm InterleavedFilter1(T,W={f 1 …f K }) Pick random f’ from W  =1/(TK 2 ) WHILE |W|>1 – FOR b 2 W DO » duel(f’,f) » update P f – t=t+1 – c t =(log(1/  )/ t) 0.5 – Remove all f from W with P f < 0.5-c t [ WORSE WITH PROB 1-  ] – IF there exists f’’ with P f’’ > 0.5+c t [ BETTER WITH PROB 1-  ] » Remove f’ from W » Remove all f from W that are empirically inferior to f’ » f‘=f’’; t=0 UNTIL T: duel(f’,f’) [Yue et al., 2009] f1f1 f2f2 f’=f 3 f4f4 f5f5 0/0 f1f1 f2f2 f’=f 3 f4f4 f5f5 8/27/34/61/9 f1f1 f2f2 f’=f 3 f4f4 13/211/47/8XX f’=f 1 f2f2 f4f4 0/0 XX Related Algorithms: [Hofmann, Whiteson, Rijke, 2011] [Yue, Joachims, 2009] [Yue, Joachims, 2011]

27 Assumptions Preference Relation: f i  f j  P(f i  f j ) = 0.5+  i,j > 0.5 Weak Stochastic Transitivity: f i  f j and f j  f k  f i  f k f 1  f 2  f 3  f 4  f 5  f 6  …  f K Strong Stochastic Transitivity:  i,k ≥ max{  i,j,  j,k }  1,4 ≥  2,4 ≥  3,4 ≥ 0.5 ≥  5,4 ≥  6,4 ≥ … ≥  K,4 Stochastic Triangle Inequality: f i  f j  f k   i,k ≤  i,j +  j,k  1,2 = 0.01 and  2,3 = 0.01   1,3 ≤ 0.02  -Winner exists:  = max i { P(f 1  f i )-0.5 } =  1,2 > 0 Theorem: IF2 incurs expected average regret bounded by

28 Proof Sketch 1.Number of duels per match between f i and f j Correct winner after O(1/  ij log(TK)) comparisons with probability 1-1/(TK 2 )  Hoeffding bound  Union bound ensures overall winner with prob 1-1/T 2.Overall regret E[R] · (1-1/T) E[R(IF2)] + 1/T O(T) = O(E[R(IF2)]) 3.Regret per match between f i and f j Upper bounded by gap between f 1 and f 2 : O(1/  12 log(T))  Strong Stochastic Transitivity  Stochastic Triangle Inequality 4.Number of matches When IF2 mistake free, then O(K) matches in expectation.  Random Walk of b’

29 Lower Bound Theorem: Any algorithm for the dueling bandits problem has regret Proof: [Karp, Kleinberg, 2007] [Kleinberg et al., 2007] Intuition: – Magically guess the best bandit, just verify guess – Worst case: 8 f i  f j : P(f i  f j )=0.5+  – Need O(1/  2 log T) duels to get 1-1/T confidence. Theorem: Any algorithm for the dueling bandits problem has regret

30 Convex Dueling Bandits Problem Given: – A infinite set H of functions f w parameterized by w 2 F ½ < d – A convex value function v(w) describing utility of f w – P(f w  f w’ ) =  (v(w)-v(w’)) = 0.5+  (w,w’) – P is L-Lipschitz Regret: – R(A) =  t=1..T [P(f*  f t ) + P(f*  f t ’) - 1] – f*: best retrieval function in hindsight – (f,f’): retrieval functions tested at time t Approach: – Stochastic estimated gradient method [Zinkevich 03] [Flaxman et al. 04] – Regret [with Yisong Yue]

31 δ – explore step size γ – exploit step size Current point Losing candidate Winning candidate Dueling Bandit Gradient Descent

32 δ – explore step size γ – exploit step size Current point Losing candidate Winning candidate Dueling Bandit Gradient Descent

33 δ – explore step size γ – exploit step size Current point Losing candidate Winning candidate Dueling Bandit Gradient Descent

34 δ – explore step size γ – exploit step size Current point Losing candidate Winning candidate Dueling Bandit Gradient Descent

35 δ – explore step size γ – exploit step size Current point Losing candidate Winning candidate Dueling Bandit Gradient Descent

36 δ – explore step size γ – exploit step size Current point Losing candidate Winning candidate Dueling Bandit Gradient Descent

37 δ – explore step size γ – exploit step size Current point Losing candidate Winning candidate Dueling Bandit Gradient Descent

38 δ – explore step size γ – exploit step size Current point Losing candidate Winning candidate Dueling Bandit Gradient Descent

39 δ – explore step size γ – exploit step size Current point Losing candidate Winning candidate Dueling Bandit Gradient Descent

40 Leveraged web search dataset 1000 Queries, 367 Dimensions (Features) Simulate “users” issuing queries

41 Interactive Learning System AlgorithmUser x t+1 dependent on y t (e.g. click given ranking, new query) y t dependent on x t (e.g. ranking for query) Utility: U(y t ) Observed Data ≠ Training Data Decisions  Feedback  Learning Algorithm – Model the users decision process to extract feedback  Pairwise comparison test P( y i  y j | U(y i )>U(y j ) ) – Design learning algorithm for this type of feedback  Dueling Bandits problem and algorithms (e.g. IF2) Design! Model!

42 Who does the exploring? Example 1

43 Who does the exploring? Example 2 Click

44 Who does the exploring? Example 3 Click

45 Coactive Feedback Model Interaction: given x Set of all y for context x User explored Algorithm prediction Improved Prediction Feedback: – Improved prediction ӯ t U(ӯ t |x t ) > U(y t |x t ) – Supervised learning: optimal prediction y t * y t * = argmax y U(y|x t )

46 User Study: Search Engine Study – 16 undergrads, familiar with Google Task – find answers to 10 given questions Data – queries, clicks, logs, manual judgments Results 46  Preferences from clicks are highly accurate.

47 Viewing vs. Clicking [Joachims et al. 2005, 2007]

48 Presentation Bias Hypothesis: Order of presentation influences where users look, but not where they click! => Users appear to have trust in Google’s ability to rank the most relevant result first. More relevant 12 12 Normal: Google’s order of results Swapped: Order of top 2 results swapped [Joachims et al. 2005, 2007]

49 Quality-of-Context Bias Hypothesis: Clicking depends only on the result itself, butnot on other results. Rank of clicked link as sorted by relevance judges Normal + Swapped2.67 Reversed3.27 => Users click on less relevant results, if they are embedded between irrelevant results. Reversed: Top 10 results in reversed order. [Joachims et al. 2005, 2007]

50 Which Results are Viewed Before Click? Clicked Link => Users typically do not look at lower results before they click (except maybe the next result).

51 Machine Translation We propose Coactive Learning as a model of interaction between a learning system and a human user, where both have the common goal of providing results of maximum utility to the user. Wir schlagen vor, koaktive Learning als ein Modell der Wechselwirkung zwischen einem Lernsystem und menschlichen Benutzer, wobei sowohl die gemeinsame Ziel, die Ergebnisse der maximalen Nutzen für den Benutzer. Wir schlagen vor, koaktive Learning als ein Modell der Wechselwirkung des Dialogs zwischen einem Lernsystem und menschlichen Benutzer, wobei sowohl die beide das gemeinsame Ziel haben, die Ergebnisse der maximalen Nutzen für den Benutzer zu liefern. Á xtxt ytyt ӯtӯt

52 Coactive Learning Model Unknown Utility Function: U(y|x) – Boundedly rational user Algorithm/User Interaction: – LOOP FOREVER Observe context x (e.g. query) Learning algorithm presents y (e.g. ranking) User returns ӯ with U(ӯ|x) > U(y|x) Regret = Regret + [ U(y*|x) – U(y|x) ] Relationship to other online learning models – Expert setting: receive U(y|x) for all y – Bandit setting: receive U(y|x) only for selected y – Dueling bandits: for selected y and ӯ, receive U(ӯ|x) > U(y|x) – Coactive setting: for selected y, receive ӯ with U(ӯ|x) > U(y|x) Optimal prediction y*=argmax y { U(x,y) } Loss for prediction ŷ Never revealed: cardinal feedback optimal y * Never revealed: cardinal feedback optimal y *

53 Preference Perceptron: Algorithm Model – Linear model of user utility: U(y|x) = w T Á (x,y) Algorithm Set w 1 = 0 FOR t = 1 TO T DO – Observe x t – Present y t = argmax y { w t T  (x t,y) } – Obtain feedback ӯ t – Update w t+1 = w t +  (x t,ӯ t ) -  (x t,y t ) This may look similar to a multi-class Perceptron, but – Feedback ӯ t is different (not get the correct class label) – Regret is different (misclassifications vs. utility difference) [Shivaswamy, Joachims, 2012]

54 α-Informative Feedback Definition: Strict ® -Informative Feedback Definition: ® -Informative Feedback Presented Slacks both pos/neg Optimal Feedback » Slack Feedback ≥ Presented + α (Best – Presented) [Shivaswamy, Joachims, 2012]

55 Preference Perceptron: Regret Bound Assumption – U(y|x) = w T ɸ(x,y), but w is unknown Theorem For user feedback ӯ that is α-informative, the average regret of the Preference Perceptron is bounded by Other Algorithms and Results – Feedback that is α-informative only in expectation – General convex loss functions of U(y*|x)-U(ŷ|x) – Regret that scales log(T)/T instead of T -0.5 for strongly convex noise  zero [Shivaswamy, Joachims, 2012]

56 Proof: Regret Bound Definition ® -informative [Shivaswamy, Joachims, 2012]

57 Expected α-Informative Feedback Definition: Expected ® -Informative Feedback Theorem: Coactive Pref Perceptron achieves Presented Optimal [Shivaswamy, Joachims, 2012]

58 Lower Bound Theorem: For any coactive learning algorithm A with linear utility, there exist x t, objects Y, and w such that REG T of A in T steps is  (1/T 0.5 ). [Shivaswamy, Joachims, 2012]

59 Experiments Web-search ranking: structured prediction – present a ranked list, feedback is ranked list – Yahoo! Learning to Rank dataset (30k queries) – Joint feature map Movie recommendation: item prediction – present a movie, feedback is another movie – MovieLens dataset (1M ratings) – Joint feature map based on “SVD features” Ground truth: best least-squares fit to ratings [Shivaswamy, Joachims, 2012]

60 Strong vs Weak Feedback Web: Click on highest utility docs until strictly ® -informative Movie: Lowest utility movie that is strictly ® -informative

61 Noisy Feedback Web: Click on the 5 most relevant docs in top 10 results Movie: Some movie with rating that is higher by one star

62 Preference Perceptron: Experiment Experiment: Automatically optimize Arxiv.org Fulltext Search Model Utility of ranking y for query x: U t (y|x) =   ° i w t T Á (x,y (i) ) [~1000 features]  Computing argmax ranking: sort by w t T Á (x,y (i) ) Feedback Construct ӯ t from y t by moving clicked links one position higher. Baseline Handtuned w base for U base (y|x) Evaluation Interleaving of ranking from U t (y|x) and U base (y|x) [Raman et al., 2013] Cumulative Win Ratio Number of Feedback Baseline Coactive Learning Analogous to DCG

63 Related Models Ordinal Regression (Crammer & Singer 2001) – Examples:, r i is numeric rank Pair Preference Learning (Herbrich et al., 1999); Freund et al. 2003) – Examples: – i.i.d. assumption, batch Ranking (Joachims, 2002; Liu 2009) – Examples:, y i * is optimal ranking – Structured Prediction, list- wise ranking Expert Model – Cardinal feedback for all arms / optimal y i * Bandit Model – Cardinal feedback only for chosen arm Dueling Bandit Model (Yue et al. 2009; Yue, Joachims 2009) – Preference feedback between two arms chosen by algorithm

64 Summary and Conclusions AlgorithmUser x t+1 dependent on y t (e.g. click given ranking, new query) y t dependent on x t (e.g. ranking for query) Utility: U(y t ) Observed Data ≠ Training Data Decisions  Feedback  Learning Algorithm – Dueling Bandits  Model: Pairwise comparison test P( y i  y j | U(y i )>U(y j ) )  Algorithm: Interleaved Filter 2, O(|Y|log(T)) regret – Coactive Learning  Model: for given y, user provides ӯ with U(ӯ|x) > U(y|x)  Algorithm: Preference Perceptron, O(ǁwǁ T 0.5 ) regret Utility model Decision model Actions y t / Experiment FeedbackExplorationRegret Dueling Bandits OrdinalNoisy rational Comparison pairs Noisy comparison AlgorithmLost comparisons Coactive Learning LinearBounded rational Structured object α-informa- tive ӯ UserCardinal utility ⁞⁞⁞⁞⁞⁞⁞ Design! Model!

65 Research in Interactive Learning Behavioral Studies Decision Model Learning Algorithm Related Fields: Micro Economics Decision Theory Econometrics Psychology Communications Cognitive Science Design Space: Decision Model Utility Model Interaction Experiments Feedback Type Regret Applications


Download ppt "Learning with Humans in the Loop Josef Broder, Olivier Chapelle, Geri Gay, Arpita Ghosh, Laura Granka, Thorsten Joachims, Bobby Kleinberg, Madhu Kurup,"

Similar presentations


Ads by Google