Presentation is loading. Please wait.

Presentation is loading. Please wait.

Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain.

Similar presentations


Presentation on theme: "Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain."— Presentation transcript:

1 Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

2 Chao Liu MSR, ISRC-Redmond Yi-Min Wang MSR, ISRC-Redmond MSR, Cambridge Mike Taylor MSR, Search Lab Anitha Kannan MSR, Cambridge Tom Minka Carnegie Mellon University Christos Faloutsos Joint Work With…

3

4 1/29/2014WWW'09, Madrid, Spain4

5 Click Logs Auto-generated data keeping important information about search activity. 51/29/2014WWW'09, Madrid, Spain Rank/PositionURL of DocumentClick 1www.metalwayfestival.com0 2www.maquitec. com0 3www.construmat.com0 4www.hispack.com0 5www.themarket.com0 6www.cursabombers.com0 7www.setegibernau.com0 8www2009.org1 9www.solardecathlon.upe.es0 10www.nxtbook.com/nxtbooks/suny/2009spring0 Query www 2009 Time 21 Apr 2009, 9:01:02

6 Problem Definition Given a click log data set, for each query-document pair, compute user-perceived relevance. 61/29/2014WWW'09, Madrid, Spain Rank/PositionDocument IdxClick Querywww 2009 Session Index103 … Document IdxRelevance 1? 2? 3? 4? 5? 6? 7? 8? 9? … Impression Data Click Data

7 Relevance Representation 1/29/2014WWW'09, Madrid, Spain7 Excellent Good Fair Bad 01 Click Chain Model 0.75 Previous Click Models Human Judge Integration

8 Applications Automated Ranking Alterations Search Engine Performance Metric Calibrate Human Judgment Related Application in Sponsored Search 81/29/2014WWW'09, Madrid, Spain

9 Roadmap Motivation and Problem Definition Click Model Basics CCM and Algorithms Experimental Evaluation Related Work and Conclusion 1/29/2014WWW'09, Madrid, Spain9

10 1/29/2014WWW'09, Madrid, Spain10

11 Eye-Tracking User Study 111/29/2014WWW'09, Madrid, Spain Fixation Heat Map

12 Overall: Fixation is biased towards higher ranks, so do the clicks. For each position: fixation/clicks are context dependent. 121/29/2014WWW'09, Madrid, Spain Normal Impression Reversed Impression

13 Problem Definition (Recap) Given a click log data set, for each query-document pair, compute user-perceived relevance and the solution should be – Aware of the position bias and context dependency – Scalable to Terabyte data – Incremental to stay updated 1/29/201413WWW'09, Madrid, Spain

14 Examination Hypothesis User behavior abstraction: Fixation binary examination variable Click binary click variable A document must be examined before being clicked. 141/29/2014WWW'09, Madrid, Spain

15 Examination Hypothesis For each position, P(Click=1) = P(Examination=1) * Relevance Relevance = P(Click=1|Examination=1) The position bias is reflected in the derivation of P(Examination). 151/29/2014WWW'09, Madrid, Spain

16 User scans through documents and make decisions in strict linear order. The decision process: E 1, C 1, E 2, C 2,… Essential part of click model: – What is the probability of See Next Doc? Cascade Hypothesis 161/29/2014WWW'09, Madrid, Spain

17 Roadmap Motivation and Problem Definition Click Model Basics CCM and Algorithms Experimental Evaluation Related Work and Conclusion 1/29/2014WWW'09, Madrid, Spain17

18 The Context Top-10 organic search results only. Query sessions are independent. Semantic info are not used. 1/29/2014WWW'09, Madrid, Spain18 Suggestions Ads Other Elements

19 User Behavior Description 1/29/2014WWW'09, Madrid, Spain19 Examine the Document Click? See Next Doc? Done No Yes No Yes See Next Doc? Done No

20 C4C4 C3C3 C2C2 C1C1 Click Chain Model 20 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … 1/29/2014WWW'09, Madrid, Spain C5C5 R5R5 E5E5

21 Why Bayesian? Modeling Benefit: – A principled way of smoothing the relevance estimates; – Offers more flexibility such as computing P(R i >R j ). Computational Benefit: – Avoid iterative optimization procedure in maximum-likelihood estimation 1/29/2014WWW'09, Madrid, Spain21

22 Relevance Inference Given a query, and all its click data compute the posterior for each possible j. Let then focus on click probability for a particular session, and look at different cases 1/29/2014WWW'09, Madrid, Spain22

23 C4C4 C3C3 C2C2 C1C1 Click Chain Model 23 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … 1/29/2014WWW'09, Madrid, Spain C5C5 R5R5 E5E5 Examination Hypothesis Cascade Hypothesis

24 C4C4 C3C3 C2C2 C1C1 24 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … 1/29/2014WWW'09, Madrid, Spain C5C5 R5R5 E5E5 0101

25 C4C4 C3C3 C2C2 C1C1 25 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … 1/29/2014WWW'09, Madrid, Spain C5C5 R5R5 E5E5 0101

26 C4C4 C3C3 C2C2 C1C1 26 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … 1/29/2014WWW'09, Madrid, Spain C5C5 R5R5 E5E5 0101

27 C4C4 C3C3 C2C2 C1C1 27 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … 1/29/2014WWW'09, Madrid, Spain C5C5 R5R5 E5E5 0101

28 C4C4 C3C3 C2C2 C1C1 28 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … 1/29/2014WWW'09, Madrid, Spain C5C5 R5R5 E5E5 0101

29 Putting them together 291/29/2014WWW'09, Madrid, Spain

30 Summary of the Algorithm Initializing (2*10+2) counts for each pair; Go through the click log once and update the counts; Compute parameter values and get β values; Ready to output results (using numerical integration if necessary). 301/29/2014WWW'09, Madrid, Spain

31 Sanity Check The algorithm should be – Aware of the position bias and context dependency – Scalable to Terabyte data Single Pass, Linear – Incremental to stay updated Update counts 1/29/201431WWW'09, Madrid, Spain

32 Roadmap Motivation and Problem Definition Click Model Basics CCM and Algorithms Experimental Evaluation Related Work and Conclusion 1/29/2014WWW'09, Madrid, Spain32

33 Data Set Collected in 2 weeks in July Preprocessing: – Discard no-click sessions for fair comparison. – 178 most frequent queries removed. Split to training/test sets according to time stamps. 331/29/2014WWW'09, Madrid, Spain

34 Data Set After preprocessing: – 110,630 distinct queries; – 4.8M/4.0M query sessions in the training/test set. 341/29/2014WWW'09, Madrid, Spain

35 Metric Efficiency: – Computational Time Effectiveness: – With known document identities in the test set, – Using the relevance and parameter learned on the training set, – To do Click Prediction. 1/29/2014WWW'09, Madrid, Spain35 (resort to indirect measure)

36 Competitors UBM: User Browsing Model (Dupret et al., SIGIR08) – More parameters – Iterative, more expensive algorithm DCM: Dependent Click Model (WSDM09) – Modeling 1+ clicks per session 1/29/2014WWW'09, Madrid, Spain36

37 Results - Time Environment: Unix Server, 2.8GHz cores, MATLAB R2008b. 1/29/2014WWW'09, Madrid, Spain37 CCMUBMDCM 9.8 min333 min5.4 min

38 Results – Perplexity Perplexity: quality of click prediction for each position individually. 381/29/2014WWW'09, Madrid, Spain Random Guess (p H =0.5): 2.00 Best Guess (p H =0.8): 1.65 Ground Truth (Cheating): 1.00

39 Results – Perplexity 391/29/2014WWW'09, Madrid, Spain Worse Better

40 Results – Perplexity Average Perplexity over top 10 positions. 401/29/2014WWW'09, Madrid, Spain ModelCCMUBMDCM Perplexity Equiv. P H Improv.7.5%8.3%

41 Results – Log Likelihood Log-likelihood: log of the chance to recover the entire click vector out of 2 10 possibilities. 411/29/2014WWW'09, Madrid, Spain ModelCCMUBMDCM LL Likelihood Improv.9.7%14%

42 Results – Log Likelihood 421/29/2014WWW'09, Madrid, Spain Better Worse

43 Roadmap Motivation and Problem Definition Click Model Basics CCM and Algorithms Experimental Evaluation Related Work and Conclusion 1/29/2014WWW'09, Madrid, Spain43

44 Related Work User behavior study and hypothesis – Eye-tracking Study (Joachims et al., KDD05, ACM TOIS) – Examination Hypothesis (Richardson et al., WWW07) – Cascade Hypothesis (Craswell et al., WSDM08) Other click models – Logistic Regression (Dupret et al., SIGIR08) – Dynamic Bayesian Network (Chapelle et al., WWW09) – Bayesian Browsing Model (KDD09, To appear) 441/29/2014WWW'09, Madrid, Spain

45 Conclusion Click Chain Model – A probabilistic approach to interpret clicks. – A Bayesian approach to model relevance. – Both scalable and incremental. Future Directions – Validation/Bucket Test. – Pairwise comparison – More on context dependency 451/29/2014WWW'09, Madrid, Spain

46 Thank you :-) 461/29/2014WWW'09, Madrid, Spain

47 Abstract/Document Relevance Relevance of Abstract: – Conditional probability of click as defined by examination hypothesis Relevance of Document: – Determines the probability of See Next Doc – A binary random variable (integrated out under CCM) 1/29/2014WWW'09, Madrid, Spain47

48 Alt. User Behavior Description 1/29/2014WWW'09, Madrid, Spain48 Examine the Document Click? Relevant? Yes No See Next Doc? Yes

49 Results – Perplexity (by Freq) 491/29/2014WWW'09, Madrid, Spain Worse Better

50 Examination/Click Distribution 501/29/2014WWW'09, Madrid, Spain

51 Predicting First/Last Clicks Root-Mean-Square error in predicting the first/last clicked position for the test data. Two approaches (bias/variance tradeoff): – EXPectation: using the expected value (bias) – SIMulation: drawing sample from the model (variance) 511/29/2014WWW'09, Madrid, Spain

52 First Clicked Position 521/29/2014WWW'09, Madrid, Spain

53 Last Clicked Position 531/29/2014WWW'09, Madrid, Spain

54 A Quick Example Here we are interested in R 3 541/29/2014WWW'09, Madrid, Spain

55 A Quick Example Here we are interested in R 3 551/29/2014WWW'09, Madrid, Spain C4C4 C3C3 C2C2 C1C1

56 A Quick Example Here we are interested in R 3 561/29/2014WWW'09, Madrid, Spain C4C4 C3C3 C2C2 C1C1 C4C4 C3C3 C2C2 C1C1

57 A Quick Example Here we are interested in R 3 571/29/2014WWW'09, Madrid, Spain C4C4 C3C3 C2C2 C1C1 C4C4 C3C3 C2C2 C1C1 C4C4 C3C3 C2C2 C1C1

58 A Quick Example Here we are interested in R 3 581/29/2014WWW'09, Madrid, Spain Mean(R 3 ) = 0.52 Std(R 3 ) = 0.22


Download ppt "Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain."

Similar presentations


Ads by Google