Download presentation

Presentation is loading. Please wait.

Published byAlejandro Hamilton Modified over 2 years ago

1
Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

2
Chao Liu MSR, ISRC-Redmond Yi-Min Wang MSR, ISRC-Redmond MSR, Cambridge Mike Taylor MSR, Search Lab Anitha Kannan MSR, Cambridge Tom Minka Carnegie Mellon University Christos Faloutsos Joint Work With…

3

4
1/29/2014WWW'09, Madrid, Spain4

5
Click Logs Auto-generated data keeping important information about search activity. 51/29/2014WWW'09, Madrid, Spain Rank/PositionURL of DocumentClick 1www.metalwayfestival.com0 2www.maquitec. com0 3www.construmat.com0 4www.hispack.com0 5www.themarket.com0 6www.cursabombers.com0 7www.setegibernau.com0 8www2009.org1 9www.solardecathlon.upe.es0 10www.nxtbook.com/nxtbooks/suny/2009spring0 Query www 2009 Time 21 Apr 2009, 9:01:02

6
Problem Definition Given a click log data set, for each query-document pair, compute user-perceived relevance. 61/29/2014WWW'09, Madrid, Spain Rank/PositionDocument IdxClick Querywww 2009 Session Index103 … Document IdxRelevance 1? 2? 3? 4? 5? 6? 7? 8? 9? … Impression Data Click Data

7
Relevance Representation 1/29/2014WWW'09, Madrid, Spain7 Excellent Good Fair Bad 01 Click Chain Model 0.75 Previous Click Models Human Judge Integration

8
Applications Automated Ranking Alterations Search Engine Performance Metric Calibrate Human Judgment Related Application in Sponsored Search 81/29/2014WWW'09, Madrid, Spain

9
Roadmap Motivation and Problem Definition Click Model Basics CCM and Algorithms Experimental Evaluation Related Work and Conclusion 1/29/2014WWW'09, Madrid, Spain9

10
1/29/2014WWW'09, Madrid, Spain10

11
Eye-Tracking User Study 111/29/2014WWW'09, Madrid, Spain Fixation Heat Map

12
Overall: Fixation is biased towards higher ranks, so do the clicks. For each position: fixation/clicks are context dependent. 121/29/2014WWW'09, Madrid, Spain Normal Impression Reversed Impression

13
Problem Definition (Recap) Given a click log data set, for each query-document pair, compute user-perceived relevance and the solution should be – Aware of the position bias and context dependency – Scalable to Terabyte data – Incremental to stay updated 1/29/201413WWW'09, Madrid, Spain

14
Examination Hypothesis User behavior abstraction: Fixation binary examination variable Click binary click variable A document must be examined before being clicked. 141/29/2014WWW'09, Madrid, Spain

15
Examination Hypothesis For each position, P(Click=1) = P(Examination=1) * Relevance Relevance = P(Click=1|Examination=1) The position bias is reflected in the derivation of P(Examination). 151/29/2014WWW'09, Madrid, Spain

16
User scans through documents and make decisions in strict linear order. The decision process: E 1, C 1, E 2, C 2,… Essential part of click model: – What is the probability of See Next Doc? Cascade Hypothesis 161/29/2014WWW'09, Madrid, Spain

17
Roadmap Motivation and Problem Definition Click Model Basics CCM and Algorithms Experimental Evaluation Related Work and Conclusion 1/29/2014WWW'09, Madrid, Spain17

18
The Context Top-10 organic search results only. Query sessions are independent. Semantic info are not used. 1/29/2014WWW'09, Madrid, Spain18 Suggestions Ads Other Elements

19
User Behavior Description 1/29/2014WWW'09, Madrid, Spain19 Examine the Document Click? See Next Doc? Done No Yes No Yes See Next Doc? Done No

20
C4C4 C3C3 C2C2 C1C1 Click Chain Model 20 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … 1/29/2014WWW'09, Madrid, Spain C5C5 R5R5 E5E5

21
Why Bayesian? Modeling Benefit: – A principled way of smoothing the relevance estimates; – Offers more flexibility such as computing P(R i >R j ). Computational Benefit: – Avoid iterative optimization procedure in maximum-likelihood estimation 1/29/2014WWW'09, Madrid, Spain21

22
Relevance Inference Given a query, and all its click data compute the posterior for each possible j. Let then focus on click probability for a particular session, and look at different cases 1/29/2014WWW'09, Madrid, Spain22

23
C4C4 C3C3 C2C2 C1C1 Click Chain Model 23 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … 1/29/2014WWW'09, Madrid, Spain C5C5 R5R5 E5E5 Examination Hypothesis Cascade Hypothesis

24
C4C4 C3C3 C2C2 C1C1 24 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … 1/29/2014WWW'09, Madrid, Spain C5C5 R5R5 E5E5 0101

25
C4C4 C3C3 C2C2 C1C1 25 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … 1/29/2014WWW'09, Madrid, Spain C5C5 R5R5 E5E5 0101

26
C4C4 C3C3 C2C2 C1C1 26 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … 1/29/2014WWW'09, Madrid, Spain C5C5 R5R5 E5E5 0101

27
C4C4 C3C3 C2C2 C1C1 27 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … 1/29/2014WWW'09, Madrid, Spain C5C5 R5R5 E5E5 0101

28
C4C4 C3C3 C2C2 C1C1 28 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … 1/29/2014WWW'09, Madrid, Spain C5C5 R5R5 E5E5 0101

29
Putting them together 291/29/2014WWW'09, Madrid, Spain

30
Summary of the Algorithm Initializing (2*10+2) counts for each pair; Go through the click log once and update the counts; Compute parameter values and get β values; Ready to output results (using numerical integration if necessary). 301/29/2014WWW'09, Madrid, Spain

31
Sanity Check The algorithm should be – Aware of the position bias and context dependency – Scalable to Terabyte data Single Pass, Linear – Incremental to stay updated Update counts 1/29/201431WWW'09, Madrid, Spain

32
Roadmap Motivation and Problem Definition Click Model Basics CCM and Algorithms Experimental Evaluation Related Work and Conclusion 1/29/2014WWW'09, Madrid, Spain32

33
Data Set Collected in 2 weeks in July Preprocessing: – Discard no-click sessions for fair comparison. – 178 most frequent queries removed. Split to training/test sets according to time stamps. 331/29/2014WWW'09, Madrid, Spain

34
Data Set After preprocessing: – 110,630 distinct queries; – 4.8M/4.0M query sessions in the training/test set. 341/29/2014WWW'09, Madrid, Spain

35
Metric Efficiency: – Computational Time Effectiveness: – With known document identities in the test set, – Using the relevance and parameter learned on the training set, – To do Click Prediction. 1/29/2014WWW'09, Madrid, Spain35 (resort to indirect measure)

36
Competitors UBM: User Browsing Model (Dupret et al., SIGIR08) – More parameters – Iterative, more expensive algorithm DCM: Dependent Click Model (WSDM09) – Modeling 1+ clicks per session 1/29/2014WWW'09, Madrid, Spain36

37
Results - Time Environment: Unix Server, 2.8GHz cores, MATLAB R2008b. 1/29/2014WWW'09, Madrid, Spain37 CCMUBMDCM 9.8 min333 min5.4 min

38
Results – Perplexity Perplexity: quality of click prediction for each position individually. 381/29/2014WWW'09, Madrid, Spain Random Guess (p H =0.5): 2.00 Best Guess (p H =0.8): 1.65 Ground Truth (Cheating): 1.00

39
Results – Perplexity 391/29/2014WWW'09, Madrid, Spain Worse Better

40
Results – Perplexity Average Perplexity over top 10 positions. 401/29/2014WWW'09, Madrid, Spain ModelCCMUBMDCM Perplexity Equiv. P H Improv.7.5%8.3%

41
Results – Log Likelihood Log-likelihood: log of the chance to recover the entire click vector out of 2 10 possibilities. 411/29/2014WWW'09, Madrid, Spain ModelCCMUBMDCM LL Likelihood Improv.9.7%14%

42
Results – Log Likelihood 421/29/2014WWW'09, Madrid, Spain Better Worse

43
Roadmap Motivation and Problem Definition Click Model Basics CCM and Algorithms Experimental Evaluation Related Work and Conclusion 1/29/2014WWW'09, Madrid, Spain43

44
Related Work User behavior study and hypothesis – Eye-tracking Study (Joachims et al., KDD05, ACM TOIS) – Examination Hypothesis (Richardson et al., WWW07) – Cascade Hypothesis (Craswell et al., WSDM08) Other click models – Logistic Regression (Dupret et al., SIGIR08) – Dynamic Bayesian Network (Chapelle et al., WWW09) – Bayesian Browsing Model (KDD09, To appear) 441/29/2014WWW'09, Madrid, Spain

45
Conclusion Click Chain Model – A probabilistic approach to interpret clicks. – A Bayesian approach to model relevance. – Both scalable and incremental. Future Directions – Validation/Bucket Test. – Pairwise comparison – More on context dependency 451/29/2014WWW'09, Madrid, Spain

46
Thank you :-) 461/29/2014WWW'09, Madrid, Spain

47
Abstract/Document Relevance Relevance of Abstract: – Conditional probability of click as defined by examination hypothesis Relevance of Document: – Determines the probability of See Next Doc – A binary random variable (integrated out under CCM) 1/29/2014WWW'09, Madrid, Spain47

48
Alt. User Behavior Description 1/29/2014WWW'09, Madrid, Spain48 Examine the Document Click? Relevant? Yes No See Next Doc? Yes

49
Results – Perplexity (by Freq) 491/29/2014WWW'09, Madrid, Spain Worse Better

50
Examination/Click Distribution 501/29/2014WWW'09, Madrid, Spain

51
Predicting First/Last Clicks Root-Mean-Square error in predicting the first/last clicked position for the test data. Two approaches (bias/variance tradeoff): – EXPectation: using the expected value (bias) – SIMulation: drawing sample from the model (variance) 511/29/2014WWW'09, Madrid, Spain

52
First Clicked Position 521/29/2014WWW'09, Madrid, Spain

53
Last Clicked Position 531/29/2014WWW'09, Madrid, Spain

54
A Quick Example Here we are interested in R 3 541/29/2014WWW'09, Madrid, Spain

55
A Quick Example Here we are interested in R 3 551/29/2014WWW'09, Madrid, Spain C4C4 C3C3 C2C2 C1C1

56
A Quick Example Here we are interested in R 3 561/29/2014WWW'09, Madrid, Spain C4C4 C3C3 C2C2 C1C1 C4C4 C3C3 C2C2 C1C1

57
A Quick Example Here we are interested in R 3 571/29/2014WWW'09, Madrid, Spain C4C4 C3C3 C2C2 C1C1 C4C4 C3C3 C2C2 C1C1 C4C4 C3C3 C2C2 C1C1

58
A Quick Example Here we are interested in R 3 581/29/2014WWW'09, Madrid, Spain Mean(R 3 ) = 0.52 Std(R 3 ) = 0.22

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google