Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Click Chain Model in Web Search Fan Guo Carnegie Mellon University PPT Revised and Presented by Xin Xin.

Similar presentations


Presentation on theme: "1 Click Chain Model in Web Search Fan Guo Carnegie Mellon University PPT Revised and Presented by Xin Xin."— Presentation transcript:

1 1 Click Chain Model in Web Search Fan Guo Carnegie Mellon University PPT Revised and Presented by Xin Xin

2 2 Outline Background and motivation Designing a click model Algorithms Experiments

3 3

4 4 How to utilize users’ feedback to improve search engine results?

5 5 Diverse User Feedback Click-through Browser action Dwelling time Explicit judgment Other page elements 5

6 6 Web Search Click Log Auto-generated data keeping important information about search activity. PositionURLClick 1cikm2008.org1 2www.cikm.org0 3www.cikm.org/20020 4www.fc.ul.pt/cikm20070 5www.comp.polyu.edu.hk/conference/cikm20091 6cikmconference.org0 7Ir.iit.edu/cikm20040 8www.informatik.uni-trier.de/~ley/db/conf/cikm/index.html0 9www.tzi.de/CIKM20050 10www.cikm.com0 Query cikm Session ID f851c5af178384d12f3d

7 7 A real world example

8 8 – search logs: 10+ TB/day –In existing publications: [Craswell+08]: 108k sessions [Dupret+08] : 4.5M sessions (21 subsets * 216k sessions) [Guo +09a] : 8.8M sessions from 110k unique queries [Guo+09b]: 8.8M sessions from 110k unique queries [Chapelle+09]: 58M sessions from 682k unique queries [Liu+09a]: 0.26PB data from 103M unique queries How large is the clicklog?

9 9 Intuition to Utilize Clicks Adapt ranking to user clicks # of clicks received

10 10 Position Bias Problem # of clicks received

11 11 Problem Definition Given a click log data set, for each query- document pair, compute user-perceived relevance and the solution should be –Aware of the position bias and context dependency –Scalable to Terabyte data –Incremental to stay updated

12 12 Outline Background and motivation Designing a click model Algorithms Experiments

13 13 Examination Hypothesis A document must be examined before a click. The (conditional) probability of click upon examination depends on document relevance.

14 14 Cascade Hypothesis The first document is always examined. First-order Markov property: –Examination at position (i+1) depends on examination and click at position i only Examination follows a strict linear order: Position iPosition (i+1)

15 15 User Behavior Description Examine the Document Click? See Next Doc? Done No Yes No Yes See Next Doc? Done No

16 16 Click Chain Model C4C4 C3C3 C2C2 C1C1 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … C5C5 R5R5 E5E5 Examination Hypothesis Cascade Hypothesis

17 17 Outline Background and motivation Designing a click model Algorithms Experiments

18 18 A Coin-Toss Example for Bayesian Framework Prior Posterior x 1 (1-x) 0 x 2 (1-x) 0 x 3 (1-x) 0 x 3 (1-x) 1 x 4 (1-x) 1 Density Function (not normalized)

19 19 Click Data Example Prior Density Function (not normalized) x 1 (1-x) 0 (1-0.6x) 0 (1+0.3x) 1 (1-0.5x) 0 (1- 0.2x) 0 … x 1 (1-x) 1 (1-0.6x) 0 (1+0.3x) 1 (1-0.5x) 0 (1- 0.2x) 0 … x 2 (1-x) 1 (1-0.6x) 0 (1+0.3x) 2 (1-0.5x) 0 (1- 0.2x) 0 … x 3 (1-x) 1 (1-0.6x) 1 (1+0.3x) 2 (1-0.5x) 0 (1- 0.2x) 0 … x 3 (1-x) 1 (1-0.6x) 1 (1+0.3x) 2 (1-0.5x) 1 (1- 0.2x) 0 …

20 20 Estimating P(C|Ri)

21 21 C4C4 C3C3 C2C2 C1C1 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … C5C5 R5R5 E5E5 0101

22 22 C4C4 C3C3 C2C2 C1C1 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … C5C5 R5R5 E5E5 0101

23 23 C4C4 C3C3 C2C2 C1C1 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … C5C5 R5R5 E5E5 0101

24 24 C4C4 C3C3 C2C2 C1C1 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … C5C5 R5R5 E5E5 0101

25 25 C4C4 C3C3 C2C2 C1C1 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … C5C5 R5R5 E5E5 0101

26 26 Putting them together

27 27 Alpha Estimation

28 28 Outline Background and motivation Designing a click model Algorithms Experiments

29 29 Data Set Collected in 2 weeks in July 2008. Preprocessing: –Discard no-click sessions for fair comparison. –178 most frequent queries removed. Split to training/test sets according to time stamps.

30 30 Data Set After preprocessing: –110,630 distinct queries; –4.8M/4.0M query sessions in the training/test set.

31 31 Metric Efficiency: –Computational Time Effectiveness: –Perplexity –Log-likely hood –Click Prediction.

32 32 Competitors UBM: User Browsing Model (Dupret et al., SIGIR’08) DCM: Dependent Click Model (WSDM’09)

33 33 Results - Time Environment: Unix Server, 2.8GHz cores, MATLAB R2008b. CCMUBMDCM 9.8 min333 min5.4 min 1.0340.55

34 34 Results – Perplexity Worse Better

35 35 Results – Log Likelihood Better Worse

36 36 First Clicked Position

37 37 Last Clicked Position

38 38 The End


Download ppt "1 Click Chain Model in Web Search Fan Guo Carnegie Mellon University PPT Revised and Presented by Xin Xin."

Similar presentations


Ads by Google