Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fan Guo Chao Liu Carnegie Mellon University Microsoft Research-Redmond.

Similar presentations


Presentation on theme: "Fan Guo Chao Liu Carnegie Mellon University Microsoft Research-Redmond."— Presentation transcript:

1 Fan Guo Chao Liu Carnegie Mellon University Microsoft Research-Redmond

2 Search Results for CIKM 1/29/20142CIKM'09 Tutorial, Hong Kong, China # of clicks received

3 Adapt ranking to user clicks? 1/29/20143CIKM'09 Tutorial, Hong Kong, China # of clicks received

4 Tools needed for non-trivial cases 1/29/20144CIKM'09 Tutorial, Hong Kong, China # of clicks received

5 One of the most extensive (yet indirect) surveys of user experience. For researchers: Help understand human interaction with IR results Design and calibrate novel models and hypotheses For practitioners: Measure, monitor and improve search engine performance. Attract more page views and clicks, boost profit 1/29/2014CIKM'09 Tutorial, Hong Kong, China5

6 Introduce problems and applications in web search click modeling. Present latest development of click models in web search. Provide examples and discuss trade-offs for model design, implementation and evaluation. 1/29/2014CIKM'09 Tutorial, Hong Kong, China6

7 1/29/2014CIKM'09 Tutorial, Hong Kong, China7 Ph.D. Student (exp. 2011), Computer Science Department, Carnegie Mellon University Advisor: Christos Faloutsos Dissertation topic: graph mining for large bioinformatics image databases 2008, M.S., CMU 2005, B.E., Tsinghua University, Beijing, China

8 Researcher, Internet Services Research Center (ISRC), MSR- Redmond. Research focus: large-scale search/browsing log analysis for effective Web information access. 2007, Ph.D., UIUC 2005, M.S., UIUC Advisor: Jiawei Han Dissertation on statistical debugging and automated failure analysis 2003, B.S., Peking University, China 1/29/2014CIKM'09 Tutorial, Hong Kong, China8

9 Introduction Designing click models Bayesian click models Selected topics on click models Conclusion 1/29/2014CIKM'09 Tutorial, Hong Kong, China9

10 Introduction Web search click logs Interpret clicks as relevance feedback Building statistical models for clicks Applications of click models Designing click models Bayesian click models Selected topics on click models Conclusion 1/29/2014CIKM'09 Tutorial, Hong Kong, China10

11 Click-through Browser action Dwelling time Explicit judgment Other page elements 1/29/2014CIKM'09 Tutorial, Hong Kong, China11

12 Auto-generated data keeping important information about search activity. 121/29/2014CIKM'09 Tutorial, Hong Kong, China PositionURLClick 1cikm2008.org1 2www.cikm.org0 3www.cikm.org/ www.fc.ul.pt/cikm www.comp.polyu.edu.hk/conference/cikm cikmconference.org0 7Ir.iit.edu/cikm www.informatik.uni-trier.de/~ley/db/conf/cikm/index.html0 9www.tzi.de/CIKM www.cikm.com0 Query cikm Session ID f851c5af178384d12f3d

13 A real world example 1/29/2014CIKM'09 Tutorial, Hong Kong, China13

14 How large is the click log? search logs: 10+ TB/day In existing publications: [Craswell+08]: 108k sessions [Dupret+08] : 4.5M sessions (21 subsets * 216k sessions) [Guo +09a] : 8.8M sessions from 110k unique queries [Guo+09b]: 8.8M sessions from 110k unique queries [Chapelle+09]: 58M sessions from 682k unique queries [Liu+09a]: 0.26PB data from 103M unique queries 1/29/2014CIKM'09 Tutorial, Hong Kong, China14

15 How large is one ? 1/29/2014CIKM'09 Tutorial, Hong Kong, China15

16 Introduction Web search click logs Interpret clicks as relevance feedback Building statistical models for clicks Applications of click models Designing click models Bayesian click models Selected topics on click models Conclusion 1/29/2014CIKM'09 Tutorial, Hong Kong, China16

17 Clicks are good… Are these two clicks equally good? Non-clicks may have excuses: Not relevant Not examined 1/29/2014CIKM'09 Tutorial, Hong Kong, China17

18 181/29/2014CIKM'09 Tutorial, Hong Kong, China

19 Higher positions receive more user attention (eye fixation) and clicks than lower positions. This is true even in the extreme setting where the order of positions is reversed. Clicks are informative but biased. 191/29/2014CIKM'09 Tutorial, Hong Kong, China [Joachims+07] Normal Position Percentage Reversed Impression Percentage

20 Clicked > Skipped Above [Joachims02] 1/29/2014CIKM'09 Tutorial, Hong Kong, China20 Preference pairs: #5>#2, #5>#3, #5>#4. Use Rank SVM to optimize the retrieval function. Limitation: Confidence of judgments Little implication to user modeling

21 Introduction Web search click logs Interpret clicks as relevance feedback Building statistical models for clicks Applications of click models Designing click models Bayesian click models Selected topics on click models Conclusion 1/29/2014CIKM'09 Tutorial, Hong Kong, China21

22 Given a set of web search click logs: Predict clicks: output the probability of click vectors given a new order of URLs. 1/29/2014CIKM'09 Tutorial, Hong Kong, China possibilities!

23 Given a set of web search click logs: Estimate relevance: measures how good a URL is with regard to the information need of the query/user. 1/29/ Relevance score = 0.5 CIKM'09 Tutorial, Hong Kong, China

24 The probability of a click if the document appears at the top position. Relevance score = 0.5 indicates that on average, the document will be clicked once per 2 sessions. Bayesian click models characterize relevance using a probability distribution 241/29/2014 Relevance score Density function CIKM'09 Tutorial, Hong Kong, China

25 Effective: aware of the position-bias and address it properly Scalable: linear complexity for both time and space, easy to parallel Incremental: flexible for model update based on new data 1/29/2014CIKM'09 Tutorial, Hong Kong, China25

26 Introduction Web search click logs Interpret clicks as relevance feedback Building statistical models for clicks Applications of click models Designing click models Bayesian click models Selected topics on click models Conclusion 1/29/2014CIKM'09 Tutorial, Hong Kong, China26

27 Optimizing the retrieval function Ranking alternation based on clicks [Liu+09b] 1/29/2014CIKM'09 Tutorial, Hong Kong, China

28 Optimizing the retrieval function Ranking alternation based on clicks As a feature to a learning-to-rank system (e.g., RankNet [Burges+05] ) 1/29/2014CIKM'09 Tutorial, Hong Kong, China28

29 Online advertising User model for sponsored search auctions 1/29/2014CIKM'09 Tutorial, Hong Kong, China29

30 Online advertising User model for sponsored search auctions Click through rate (CTR) prediction [Zhu+10] 1/29/2014CIKM'09 Tutorial, Hong Kong, China30

31 Search engine evaluation Pskip [Wang+09]: click-through-rate above last clicks; dwelling time features could also be incorporated. 1/29/2014CIKM'09 Tutorial, Hong Kong, China31

32 Search engine evaluation Pskip [Wang+09]: click-through-rate above last clicks; Search relevance score [Guo+09c]: average relevance score weighted by chance of examination 1/29/2014CIKM'09 Tutorial, Hong Kong, China32

33 User behavior analysis A preliminary work showing different user behavior patterns for navigational and informational queries [Guo+09c] 1/29/2014CIKM'09 Tutorial, Hong Kong, China33

34 Introduction Designing click models Basic user hypotheses Modeling the first click Extending to multiple clicks Summary of model design Bayesian click models Selected topics on click models Conclusion 1/29/2014CIKM'09 Tutorial, Hong Kong, China34

35 A document must be examined before a click. The (conditional) probability of click upon examination depends on document relevance. 1/29/2014CIKM'09 Tutorial, Hong Kong, China35

36 The click probability could be decomposed: Global component: the examination probability which reflects the position-bias Local component: depends on the (query, URL) pair only The building block for every existing model! 1/29/2014CIKM'09 Tutorial, Hong Kong, China36

37 The first document is always examined. First-order Markov property: Examination at position (i+1) depends on examination and click at position i only Examination follows a strict linear order: 1/29/2014CIKM'09 Tutorial, Hong Kong, China37 Position iPosition (i+1)

38 The first document is always examined. First-order Markov property: Examination at position (i+1) depends on examination and click at position i only Examination follows a strict linear order: 1/29/2014CIKM'09 Tutorial, Hong Kong, China38 Position iPosition (i+1)

39 Limitation: examination/click rate monotonically decreases with rank, which is not always true. Some models do not follow this hypothesis (e.g., UBM) 1/29/2014CIKM'09 Tutorial, Hong Kong, China39 Web search data in [Guo+09a]Ads click data in [Zhu+10]

40 Introduction Designing click models Basic user hypotheses Modeling the first click Extending to multiple clicks Summary of model design Bayesian click models Selected topics on click models Conclusion 1/29/2014CIKM'09 Tutorial, Hong Kong, China40

41 Put together two hypotheses: Formal model specification: P(C i =1|E i =0) = 0, P(C i =1|E i =1) = r u i P(E 1 =1) =1, P(E i+1 =1|E i =0) = 0 P(E i+1 =1|E i =1, C i =0)=1 1/29/2014CIKM'09 Tutorial, Hong Kong, China41 Cascade Model = [Craswell+08] examination hypothesis cascade hypothesis modeling a single click

42 The user behavior chart: 1/29/2014CIKM'09 Tutorial, Hong Kong, China42 Examine the URL Click? Yes No See Next URL? Done Yes Index for URL at position i

43 First click in Click Chain Model [Guo+09b] as well as Dynamic Bayesian Network model [Chapelle+09] 1/29/2014CIKM'09 Tutorial, Hong Kong, China43 The chance that user may immediately abandon examination w/o a click. Examine the URL Click? Yes No See Next URL? Done Yes Done No

44 First click in User Browsing Model [Dupret+08] 1/29/2014CIKM'09 Tutorial, Hong Kong, China44 Examine the URL Click? Yes No Done Yes No i i+1 See Next URL? Position-dependent parameters

45 Introduction Designing click models Basic user hypotheses Modeling the first click Extending to multiple clicks Summary of model design Bayesian click models Selected topics on click models Conclusion 1/29/2014CIKM'09 Tutorial, Hong Kong, China45

46 Generalize the cascade model to 1+ clicks: P(C i =1|E i =0) = 0, P(C i =1|E i =1) = r u i P(E 1 =1) =1, P(E i+1 =1|E i =0) = 0 P(E i+1 =1|E i =1, C i =0)=1 P(E i+1 =1|E i =1, C i =1)= λ i 1/29/2014CIKM'09 Tutorial, Hong Kong, China46 λ:global parameters characterizing user browsing behavior

47 Generalize the cascade model to 1+ clicks: 1/29/2014CIKM'09 Tutorial, Hong Kong, China47

48 DCM Algorithms: Input: for each query session, the query term, with (URL, clicked) tuple for all top-10 positions. Output: relevance for each (query, URL) pair; global parameters for user behavior Method: approximate* maximum-likelihood estimation. 1/29/2014CIKM'09 Tutorial, Hong Kong, China48 *Footnote: the algorithm maximizes a lower bound of log-likelihood function.

49 1/29/2014CIKM'09 Tutorial, Hong Kong, China49 PositionURLClick 1cikm2008.org1 2www.cikm.org0 3www.cikm.org/ www.fc.ul.pt/cikm www.comp.polyu.edu.hk/...1 6cikmconference.org0 7Ir.iit.edu/cikm www.informatik.uni-trier.de...0 9www.tzi.de/CIKM www.cikm.com0 Last clicked position Query cikm Session ID f851c5af178384d12f3d

50 1/29/2014CIKM'09 Tutorial, Hong Kong, China50 PositionURLClick 1cikm2008.org0 2www.cikm.org1 3www.cikm.org/ www.fc.ul.pt/cikm cikmconference.org0 6www.comp.polyu.edu.hk/...1 7Ir.iit.edu/cikm www.informatik.uni-trier.de...0 9www.tzi.de/CIKM www.cikm.com0 Last clicked position Query cikm Session ID ab8dee4c4dd21e6aaf03

51 The estimation formula for relevance: empirical CTR measured before last clicked position The estimation formula for global (user behavior) parameters: empirical probability of clicked-but-not-last 1/29/2014CIKM'09 Tutorial, Hong Kong, China51

52 Keep 3 counts for each (query, URL) pair Then 1/29/2014CIKM'09 Tutorial, Hong Kong, China52 Details

53 The examine-next probability depends on the relevance of the URL clicked: 1/29/2014CIKM'09 Tutorial, Hong Kong, China53 Not what I want, go to examine the next Aha, this is the right one, and Im done!

54 The examine-next probability depends on the relevance of the URL clicked: P(E i+1 =1|E i =1, C i =1)= α 2 (1-r u i ) + α 3 r u i P(E i+1 =1|E i =1, C i =0)= α 1 where 0 < α 1 1, 0 α 3 < α 2 1 1/29/2014CIKM'09 Tutorial, Hong Kong, China54

55 The full picture: 1/29/2014CIKM'09 Tutorial, Hong Kong, China55

56 There is a subtle difference between the relevance of the URL snippet and the landing page. 1/29/2014CIKM'09 Tutorial, Hong Kong, China56 hmmm…, this looks pretty nice errr…, its way out of date Conclusion: attractive, but not satisfactory.

57 The examine-next probability depends on thesatisfaction score: P(E i+1 =1|E i =1, C i =1)= γ(1-s u i ) + 0s u i P(E i+1 =1|E i =1, C i =0)= γ where 0 < γ 1 The click probability is associated withattractiveness score: P(C i =1|E i =1)= a u i 1/29/2014CIKM'09 Tutorial, Hong Kong, China57

58 The full picture: 1/29/2014CIKM'09 Tutorial, Hong Kong, China58

59 The examine-next probability depends on both the preceding clicked position r, and the distance to this position d. 1/29/2014CIKM'09 Tutorial, Hong Kong, China59 r = 0 d = 1 PositionURLClick 1cikm2008.org0 2www.cikm.org1 3www.cikm.org/ www.fc.ul.pt/cikm cikmconference.org0 6www.comp.polyu.edu.hk/...1 ………

60 The examine-next probability depends on both the preceding clicked position r, and the distance to this position d. 1/29/2014CIKM'09 Tutorial, Hong Kong, China60 r = 0 d = 2 PositionURLClick 1cikm2008.org0 2www.cikm.org1 3www.cikm.org/ www.fc.ul.pt/cikm cikmconference.org0 6www.comp.polyu.edu.hk/...1 ………

61 The examine-next probability depends on both the preceding clicked position r, and the distance to this position d. 1/29/2014CIKM'09 Tutorial, Hong Kong, China61 r = 2 d = 1 PositionURLClick 1cikm2008.org0 2www.cikm.org1 3www.cikm.org/ www.fc.ul.pt/cikm cikmconference.org0 6www.comp.polyu.edu.hk/...1 ………

62 The examine-next probability depends on both the preceding clicked position r, and the distance to this position d. 1/29/2014CIKM'09 Tutorial, Hong Kong, China62 r = 2 d = 2 PositionURLClick 1cikm2008.org0 2www.cikm.org1 3www.cikm.org/ www.fc.ul.pt/cikm cikmconference.org0 6www.comp.polyu.edu.hk/...1 ………

63 The examine-next probability depends on both the preceding clicked position r, and the distance to this position d. 1/29/2014CIKM'09 Tutorial, Hong Kong, China63 r = 2 d = 3 PositionURLClick 1cikm2008.org0 2www.cikm.org1 3www.cikm.org/ www.fc.ul.pt/cikm cikmconference.org0 6www.comp.polyu.edu.hk/...1 ………

64 The examine-next probability depends on both the preceding clicked position r, and the distance to this position d. Users would lose patience when they browse through without issuing a click. The probability monotonically drops as d increases and r remains the same. 1/29/2014CIKM'09 Tutorial, Hong Kong, China64

65 The examine-next probability depends on both the preceding clicked position r, and the distance to this position d. P(E i =1|C 1:i-1 )= β r i,d i 55 parameters are needed for top-10 positions (0r

66 The full picture: 1/29/2014CIKM'09 Tutorial, Hong Kong, China66

67 Introduction Designing click models Basic user hypotheses Modeling the first click Extending to multiple clicks Summary of model design Bayesian click models Selected topics on click models Conclusion 1/29/2014CIKM'09 Tutorial, Hong Kong, China67

68 Probability of examine the first URL 1/29/2014CIKM'09 Tutorial, Hong Kong, China68 ModelP(E 1 ) Cascade1 DCM1 CCM1*1* DBN1*1* UBMβ 0,1 * Footnote: it is flexible to add another parameter to specify this probability.

69 Probability of click upon examination 1/29/2014CIKM'09 Tutorial, Hong Kong, China69 ModelP(C i =1|E i =1) Cascaderdirdi DCMrdirdi CCM r d i * DBNadiadi UBMrdirdi * Footnote: the mean of the relevance distribution, detailed in the next part

70 Probability of examine-next w/o a click 1/29/2014CIKM'09 Tutorial, Hong Kong, China70 ModelP(E i+1 =1|E i =1,C i =0) Cascade1 DCM1 CCMα1α1 DBNγ UBM β r i+1,d i+1 * * Footnote: the probability does not depend on E i

71 Probability of examine-next after a click 1/29/2014CIKM'09 Tutorial, Hong Kong, China71 ModelP(E i+1 =1|E i =1,C i =1) Cascade-- DCMαiαi CCMα 2 (1-r d i ) + α 3 r d i DBNγ(1-s d i ) UBM β i,1

72 Probability of examine-next after a click 1/29/2014CIKM'09 Tutorial, Hong Kong, China72 ModelP(E i+1 =1|E i =1,C i =1) Cascade-- DCMαiαi CCMα 2 (1-r d i ) + α 3 r d i DBNγ(1-s d i ) UBM β i,1

73 Size of parameter sets 1/29/2014CIKM'09 Tutorial, Hong Kong, China73 Model# of global params Cascade0 DCM9 CCM3 DBN1 UBM 55

74 Inference and estimation algorithms 1/29/2014CIKM'09 Tutorial, Hong Kong, China74 ModelSingle-PassDetails DCM Maximizing a lower bound of LL, fastest CCM No iteration needed, thanks to the Bayesian framework DBN EM-based, iterative algorithms UBM EM-based, usually takes ~30 iterations to converge

75 Inference and estimation algorithms 1/29/2014CIKM'09 Tutorial, Hong Kong, China75 ModelSingle-PassDetails DCM Maximizing a lower bound of LL, fastest CCM No iteration needed, thanks to the Bayesian framework DBN EM-based, iterative algorithms UBM EM-based, usually takes ~30 iterations to converge

76 Introduction Designing click models Bayesian click models Bayesian framework and the rationale Bayesian Browsing Model: a case study Click Chain Model in a nutshell Selected topics on click models Conclusion 1/29/2014CIKM'09 Tutorial, Hong Kong, China76

77 p(H)=0.8 Frequentist Bayesian 01 PriorPosterior 10 1/29/201477CIKM'09 Tutorial, Hong Kong, China p(H) probability of p(H)

78 Prior Posterior 1/29/201478CIKM'09 Tutorial, Hong Kong, China Density Function (not normalized) x x 2 x 3 x 3 (1-x) x 4 (1-x)

79 Prior Posterior 1/29/201479CIKM'09 Tutorial, Hong Kong, China Density Function (not normalized) x 1 (1-x) 0 x 2 (1-x) 0 x 3 (1-x) 0 x 3 (1-x) 1 x 4 (1-x) 1

80 The graphical model for coin-toss 1/29/2014CIKM'09 Tutorial, Hong Kong, China80 X C1C1 C2C2 C3C3 C4C4 C5C5

81 The graphical model for coin-toss 1/29/2014CIKM'09 Tutorial, Hong Kong, China81 X C1C1 C2C2 C3C3 C4C4 C5C5

82 1/29/2014CIKM'09 Tutorial, Hong Kong, China82 Prior Density Function (not normalized) x 1 (1-x) 0 (1-0.6x) 0 (1+0.3x) 1 (1-0.5x) 0 (1-0.2x) 0 … x 1 (1-x) 1 (1-0.6x) 0 (1+0.3x) 1 (1-0.5x) 0 (1-0.2x) 0 … x 2 (1-x) 1 (1-0.6x) 0 (1+0.3x) 2 (1-0.5x) 0 (1-0.2x) 0 … x 3 (1-x) 1 (1-0.6x) 1 (1+0.3x) 2 (1-0.5x) 0 (1-0.2x) 0 … x 3 (1-x) 1 (1-0.6x) 1 (1+0.3x) 2 (1-0.5x) 1 (1-0.2x) 0 …

83 Representation of relevance A probability distribution on [0,1] for each (query, URL) pair The density function is in a polynomial form over a small set of linear factors. The coefficients of such linear factors are shared between different (query, URL) pairs. 1/29/2014CIKM'09 Tutorial, Hong Kong, China83 x 3 (1-1x) 1 (1-0.6x) 1 (1+0.3x) 2 (1-0.5x) 1 (1-0.2x) 0 …

84 Inference: Go over each query session once, update the exponents for corresponding (query, URL) pair impressed * Analytical or numerical integration may be needed to compute the normalization constant. 1/29/2014CIKM'09 Tutorial, Hong Kong, China84 * Footnote: by virtue of the Bayes theorem and conditional independence relationship/assumption

85 Key problems: Which is the right factor to update? How to estimate all the coefficients? 1/29/2014CIKM'09 Tutorial, Hong Kong, China85

86 Modeling Benefits: Confidence for the URL relevance estimate Relative judgments: probability of URL i is more relevant to the query than URL j Easy to interpret: coefficients in linear factors reflect position-bias and user browsing patterns Computational Benefits: Single-pass, linear algorithms; no iterations Paralleled version is easy to implement 1/29/2014CIKM'09 Tutorial, Hong Kong, China86

87 Introduction Designing click models Bayesian click models Bayesian framework and the rationale Bayesian Browsing Model: a case study Click Chain Model in a nutshell Selected topics on click models Conclusion 1/29/2014CIKM'09 Tutorial, Hong Kong, China87

88 For a specific query session, let where 1 i M=10. 1/29/ S1S1 S2S2 S3S3 SMSM … E1E1 E2E2 E3E3 EMEM … C1C1 C2C2 C3C3 CMCM … CIKM'09 Tutorial, Hong Kong, China

89 1/29/ S1S1 S2S2 S3S3 SMSM … E1E1 E2E2 E3E3 EMEM … C1C1 C2C2 C3C3 CMCM … Relevance Examination Click CIKM'09 Tutorial, Hong Kong, China

90 Compute the posterior distribution Conditional independence relationship induced from the graphical model 1/29/ How many times the URL j was clicked How many times URL j was not clicked when it is at position (r + d) with the preceding click at position r CIKM'09 Tutorial, Hong Kong, China Details

91 911/29/2014 Only top M=3 positions are shown, 3 query sessions and 4 distinct URLs Position Query Session 3 Query Session 2 Query Session 1 CIKM'09 Tutorial, Hong Kong, China

92 921/29/2014 Initialize M(M+1)/2+1 counts for each URL URL Clicks r=0 d=1 r=0 d=2 r=0 d=3 r=1 d=1 r=1 d=2 r=2 d= CIKM'09 Tutorial, Hong Kong, China

93 931/29/2014 Update counts for URL 4 If not impressed, do nothing; If clicked, increment clicks by 1; Otherwise, locate the right r and d to increment. URL Clicks r=0 d=1 r=0 d=2 r=0 d=3 r=1 d=1 r=1 d=2 r=2 d= CIKM'09 Tutorial, Hong Kong, China

94 941/29/2014 Update counts for URL 4 If not impressed, do nothing; If clicked, increment clicks by 1; Otherwise, locate the right r and d to increment. URL Clicks r=0 d=1 r=0 d=2 r=0 d=3 r=1 d=1 r=1 d=2 r=2 d= CIKM'09 Tutorial, Hong Kong, China

95 951/29/2014 Update counts for URL 4 If not impressed, do nothing; If clicked, increment clicks by 1; Otherwise, locate the right r and d to increment. URL Clicks r=0 d=1 r=0 d=2 r=0 d=3 r=1 d=1 r=1 d=2 r=2 d= CIKM'09 Tutorial, Hong Kong, China

96 961/29/2014 The posterior for URL 4 Interpretation: The larger the probability of examination, the stronger the penalty for a non-click. URL Clicks r=0 d=1 r=0 d=2 r=0 d=3 r=1 d=1 r=1 d=2 r=2 d= CIKM'09 Tutorial, Hong Kong, China

97 Keep 2 counts for each parameter (one for click, and the other one for non-click) 1/29/2014CIKM'09 Tutorial, Hong Kong, China97 Parameter ClickNon-clickParameterClickNon-Click β 0,1 00β 1,1 00 β 0,2 00β 1,2 00 β 0,3 00β 2,1 00

98 For each position in a query session, locate the right r and d to increment. 1/29/2014CIKM'09 Tutorial, Hong Kong, China98 Parameter ClickNon-clickParameterClickNon-Click β 0,1 10β 1,1 01 β 0,2 00β 1,2 01 β 0,3 00β 2,1 00

99 For each position in a query session, locate the right r and d to increment. 1/29/2014CIKM'09 Tutorial, Hong Kong, China99 Parameter Click Non-clickParameterClickNon-Click β 0,1 11β 1,1 01 β 0,2 10β 1,2 01 β 0,3 00β 2,1 01

100 For each position in a query session, locate the right r and d to increment. 1/29/2014CIKM'09 Tutorial, Hong Kong, China100 Parameter Click Non-clickParameterClickNon-Click β 0,1 12β 1,1 11 β 0,2 10β 1,2 01 β 0,3 00β 2,1 11

101 Maximum-Likelihood Estimate: 1/29/2014CIKM'09 Tutorial, Hong Kong, China101 Parameter Click Non-clickParameterClickNon-Click β 0,1 12β 1,1 11 β 0,2 10β 1,2 01 β 0,3 00β 2,1 11

102 Let Initializing and updating the counts: Time: Space: 1/29/ Linear to the size of the click log Almost constant storage required CIKM'09 Tutorial, Hong Kong, China Details

103 Let Initializing and updating the counts: Time: Space: Computing relevance scores using numerical integration with B bins: Time: Space: 1/29/ CIKM'09 Tutorial, Hong Kong, China Details

104 Step 1: Step 1: initialize counting statistics; Step 2: Step 2: scan through the click log once and update the counts for both inference and estimation Step 3: Step 3: compute parameter values; Step 4: Step 4: use numerical integration to obtain relevance scores. Step 2 also applies for (linear) incremental computation! 1/29/ CIKM'09 Tutorial, Hong Kong, China

105 Introduction Designing click models Bayesian click models Bayesian framework and the rationale Bayesian Browsing Model: a case study Click Chain Model in a nutshell Selected topics on click models Conclusion 1/29/2014CIKM'09 Tutorial, Hong Kong, China105

106 The user behavior model: 1/29/2014CIKM'09 Tutorial, Hong Kong, China106

107 Graphical model: 1/29/2014CIKM'09 Tutorial, Hong Kong, China107 Relevance Examination Click S1S1 S2S2 S3S3 SMSM … E1E1 E2E2 E3E3 EMEM … C1C1 C2C2 C3C3 CMCM …

108 1/29/2014CIKM'09 Tutorial, Hong Kong, China108 Details

109 Number of user behavior parameters Number of distinct factors for (query, URL) Number of counts needed for parameters 1/29/2014CIKM'09 Tutorial, Hong Kong, China109 CCMUBM 355 CCMUBM 2256 CCMUBM 5110

110 Introduction Designing click models Bayesian click models Selected topics on click models Scaling click models for Petabyte-scale data Click model evaluation Tailoring user goals to click models Conclusion 1/29/2014CIKM'09 Tutorial, Hong Kong, China110

111 Data collected in 8 weeks Job k includes data between week 1 and k Both time and space costs are prohibitive for a single node. 1/29/ CIKM'09 Tutorial, Hong Kong, China

112 A Simple Task: counting # impression for each (query, URL) pair 1/29/2014CIKM'09 Tutorial, Hong Kong, China112

113 Extent GetPairs Map Sort Extent GetPairs Map Sort Extent GetPairs Map Sort Extent GetPairs Map Sort Output Count Machine #1Machine #2Machine #3Machine #4

114 Extent GetPairs Map Sort Extent GetPairs Map Sort Extent GetPairs Map Sort Extent GetPairs Map Sort Output Count Map puts all of the same Pairs onto one machine. This allows you to group by various fields in subsequent processes. Machine #1Machine #2Machine #3Machine #4

115 A Simple Task: counting # impression for each (query, URL) pair Map = Bucket: the intermediate key is (query, URL) pair 1/29/2014CIKM'09 Tutorial, Hong Kong, China115

116 Extent GetPairs Map Sort Extent GetPairs Map Sort Extent GetPairs Map Sort Extent GetPairs Map Sort Output Count Count carries out standard increment-by-1 over each distinct Pair. Machine #1Machine #2Machine #3Machine #4 Count REDUCES the amount of data since each Pair has only one output value

117 A Simple Task: counting # impression for each (query, URL) pair Map = Bucket: the intermediate key is (query, URL) pair Reduce = Count: it accepts a list of (key, value) tuple, and outputs the final result for each distinct key 1/29/2014CIKM'09 Tutorial, Hong Kong, China117

118 Extent GetPairs Map Sort Extent GetPairs Map Sort Extent GetPairs Map Sort Extent GetPairs Map Sort Output Count MAP REDUCE Machine #1Machine #2Machine #3Machine #4

119 1/29/ for clicks CIKM'09 Tutorial, Hong Kong, China

120 Map: scan the click log Intermediate key: (query, URL) Value: the index of linear factors (0~55 for top-10 positions) Reduce: scan the list of (key, value) The key indicates which exponent vector to update The value indicates the index of the element in the exponent vector to increment 1/29/2014CIKM'09 Tutorial, Hong Kong, China120

121 Linearly increasing computation load Near-constant elapsed time 1/29/ Single machine computation load Elapse time on SCOPE 3 hours 265 TB log data 1.15 billion (query, url) pairs CIKM'09 Tutorial, Hong Kong, China

122 Introduction Designing click models Bayesian click models Selected topics on click models Scaling click models for Petabyte-scale data Click model evaluation Tailoring user goals to click models Conclusion 1/29/2014CIKM'09 Tutorial, Hong Kong, China122

123 1/29/ Impression DataClick Data CIKM'09 Tutorial, Hong Kong, China

124 1/29/ Impression DataClick Data Relevance Scores Global Parameters M=10 CIKM'09 Tutorial, Hong Kong, China

125 Relevance New Impression Vector from an Existing Query 1/29/ Global params Predicted Examination Predicted Clicks CIKM'09 Tutorial, Hong Kong, China

126 Data are collected from a commercial search engine after query term normalization and spam removal. For each query term, split query sessions evenly into training and test sets according to the timestamp. Top frequent/infrequent query terms are removed. 1/29/2014CIKM'09 Tutorial, Hong Kong, China126

127 Most popular metrics: Average test data log-likelihood (LL) (probability of accurately predicting the click vector, 2^10 possibilities) [Guo+09a, Guo+09b, Liu+09a, Zhu+10] Perplexity of prediction for each position (2^{average entropy} of click/no-click binary prediction for each position independently) [Dupret+08, Guo+09a, Guo+09b, Zhu+10] 1/29/2014CIKM'09 Tutorial, Hong Kong, China127

128 Other Metrics: Click-through-rate (CTR) prediction (Especially for predicting [Chapelle+09, Zhu+10] Predicting first/last clicked positions [Guo+09a, Guo+09b] Position-bias sanity check (plot the click rate curve for top-10 positions v.s. the ground truth) [Guo+09a, Guo+09b] 1/29/2014CIKM'09 Tutorial, Hong Kong, China128

129 Average Log-likelihood Random guess: log(2 -10 ) = Optimal value: /29/2014 ModelCCMUBMDCM LL Improve- ment Ratio 9.7%14% CIKM'09 Tutorial, Hong Kong, China

130 1301/29/2014 Better Worse CIKM'09 Tutorial, Hong Kong, China

131 1311/29/2014 Better Worse CIKM'09 Tutorial, Hong Kong, China

132 Average Perplexity over top 10 positions Random guess: 2 Optimal value: /29/2014CIKM'09 Tutorial, Hong Kong, China ModelCCMUBMDCM Perplexity Improve- ment Ratio 7.5%8.3%

133 1331/29/2014CIKM'09 Tutorial, Hong Kong, China Worse Better

134 1341/29/2014CIKM'09 Tutorial, Hong Kong, China

135 1/29/2014CIKM'09 Tutorial, Hong Kong, China135 For 1M query sessions, the estimated time in seconds: * Time for CCM and BBM includes computing posterior mean and variance using numerical integration w/ 100 bins. ** UBM converges in 34 iterations. DCMCCM * BBM * UBM ** ,000

136 Introduction Designing click models Bayesian click models Selected topics on click models Scaling click models for Petabyte-scale data Click model evaluation Tailoring user goals to click models Conclusion 1/29/2014CIKM'09 Tutorial, Hong Kong, China136

137 Queries could be categorized into 2 sets: Navigational: to find the link to an existing website, e.g., bing; Informational: more exploration, multiple clicks may arise, e.g., iron man. 1/29/2014CIKM'09 Tutorial, Hong Kong, China137

138 Different user goals result in different browsing and click patterns. The straightforward mixture-modeling approach is not practical. [Dupret+08] Solution: Classify query terms a priori based on user goals. Fitting and learning 2 sets of model parameters for navigational and informational queries. 1/29/2014CIKM'09 Tutorial, Hong Kong, China138

139 Two-way classification for query terms based on click data using… Median position of click distribution Mean position of click distribution Average # clicks per query session … Pick the one which has best click prediction If a position receives 50% of the click, then navigational, else informational 1/29/2014CIKM'09 Tutorial, Hong Kong, China139

140 Improvement of click prediction for DCM: Log-Likelihood: 4.0% Perplexity: 1.3% Examination/Click position-bias: 1/29/2014CIKM'09 Tutorial, Hong Kong, China140

141 Introduction Designing click models Bayesian click models Selected topics on click models Conclusion 1/29/2014CIKM'09 Tutorial, Hong Kong, China141

142 Click models A statistical tool to leverage valuable user implicit feedback in terabyte/petabyte search logs. Provide click prediction as well as relevance estimates. Application domains include learning to rank, measuring search performance, online advertising, user behavior analysis… 1/29/2014CIKM'09 Tutorial, Hong Kong, China142

143 Click models Different model designs reflect various assumption of user behaviors to explain the position-bias. The modeling choice may depend on the application scenario. 1/29/2014CIKM'09 Tutorial, Hong Kong, China143

144 Click models Efficient, single-pass, parallelizable algorithms are desired in real-world applications. Bayesian framework could be applied to click models for both modeling benefits and computational benefits. Click Chain Model and Bayesian Browsing Model represent state-of-the-art examples. 1/29/2014CIKM'09 Tutorial, Hong Kong, China144

145 Bigger Context Query reformulations Personalization Richer inputs Universal search Diverse user feedback Click model v.s. Human judgments 1/29/2014CIKM'09 Tutorial, Hong Kong, China145

146 [Burges+05]: C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. ICML05. [Chapelle+09]: O. Chapelle and Y. Zhang. A dynamic Bayesian network click model for web search ranking. WWW09. [Craswell+08]: N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey. An experimental comparison of click position-bias models. WSDM 08. [Dean+04]: J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. OSDI04. [Dupret+08]: G. Dupret and B. Piwowarski. A user browsing model to predict search engine click data from past observations. SIGIR08. 1/29/2014CIKM'09 Tutorial, Hong Kong, China146

147 [Guo+09a]: F. Guo, C. Liu, and Y.-M. Wang. Efficient multiple-click models in web search. WSDM09. [Guo+09b]: F. Guo, C. Liu, A. Kannan, T. Minka, M. Taylor, Y.-M. Wang, and C. Faloutsos. Click chain model in web search. WWW09. [Guo+09c]: F. Guo, L. Li, and C. Faloutsos. Tailoring click models to user goals. WSCD09. [Joachims02]: T. Joachims. Optimizing search engines using clickthrough data. KDD02. [Joachims+07]: T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay. Accurately interpreting clickthrough data as implicit feedback, ACM TOIS, 25(2), /29/2014CIKM'09 Tutorial, Hong Kong, China147

148 [Lee+05]: U. Lee, Z. Liu, and J. Cho. Automatic identification ofuser goals in web search. WWW05. [Liu+09a]: C. Liu, F. Guo, and C. Faloutsos. BBM: Deriving click models from petabyte-scale data. KDD09. [Liu+09b]: C. Liu, M. Li, and Y.-M. Wang. Post-rank reordering: resolving preference misalignments between search engines and end users. CIKM09. [Richardson+07]: M. Richardson, E. Dominowska, and R. Ragno. Predicting clicks: estimating the click-through rate for new ads. WWW07. [Zhu+10]: Z. Zhu, W. Chen, T. Minka, C. Zhu and Z. Chen. A novel click model and its applications to online advertising. To appear in WSDM10. 1/29/2014CIKM'09 Tutorial, Hong Kong, China148

149 1/29/2014CIKM'09 Tutorial, Hong Kong, China149 MSR, Search Lab Anitha Kannan MSR, Cambridge Tom Minka Carnegie Mellon University Christos Faloutsos Li-Wei He MSR, ISRC-Redmond MSR, Cambridge Nick Craswell

150 1/29/2014CIKM'09 Tutorial, Hong Kong, China150 Yi-Min Wang MSR, ISRC-Redmond MSR, Cambridge Mike Taylor MSR, ISRC-Redmond Ethan Tu

151 1/29/2014CIKM'09 Tutorial, Hong Kong, China151


Download ppt "Fan Guo Chao Liu Carnegie Mellon University Microsoft Research-Redmond."

Similar presentations


Ads by Google