Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at.

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

Modelling Relevance and User Behaviour in Sponsored Search using Click-Data Adarsh Prasad, IIT Delhi Advisors: Dinesh Govindaraj SVN Vishwanathan* Group:
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. 16 Mathematics of Normal Distributions 16.1Approximately Normal.
Statistical Issues in Research Planning and Evaluation
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
April 2, 2015Applied Discrete Mathematics Week 8: Advanced Counting 1 Random Variables In some experiments, we would like to assign a numerical value to.
Parameter Estimation using likelihood functions Tutorial #1
Improving relevance prediction by addressing biases and sparsity in web search click data Qi Guo, Dmitry Lagun, Denis Savenkov, Qiaoling Liu
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Visual Recognition Tutorial
BBM: Bayesian Browsing Model from Petabyte-scale Data Chao Liu, MSR-Redmond Fan Guo, Carnegie Mellon University Christos Faloutsos, Carnegie Mellon University.
Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University
Statistical Methods Chichang Jou Tamkang University.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Evaluating Hypotheses
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Visual Recognition Tutorial
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Slide 1 Statistics Workshop Tutorial 4 Probability Probability Distributions.
Thanks to Nir Friedman, HU
Elec471 Embedded Computer Systems Chapter 4, Probability and Statistics By Prof. Tim Johnson, PE Wentworth Institute of Technology Boston, MA Theory and.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
1 Machine Learning: Lecture 5 Experimental Evaluation of Learning Algorithms (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
Statistical Decision Theory
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Review and Preview This chapter combines the methods of descriptive statistics presented in.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Ramakrishnan Srikant Sugato Basu Ni Wang Daryl Pregibon 1.
Fan Guo 1, Chao Liu 2 and Yi-Min Wang 2 1 Carnegie Mellon University 2 Microsoft Research Feb 11, 2009.
Naive Bayes Classifier
1 7. Two Random Variables In many experiments, the observations are expressible not as a single quantity, but as a family of quantities. For example to.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
1 Let X represent a Binomial r.v as in (3-42). Then from (2-30) Since the binomial coefficient grows quite rapidly with n, it is difficult to compute (4-1)
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
INTRODUCTION TO Machine Learning 3rd Edition
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.
More Than Relevance: High Utility Query Recommendation By Mining Users' Search Behaviors Xiaofei Zhu, Jiafeng Guo, Xueqi Cheng, Yanyan Lan Institute of.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
1 Click Chain Model in Web Search Fan Guo Carnegie Mellon University PPT Revised and Presented by Xin Xin.
1 6. Mean, Variance, Moments and Characteristic Functions For a r.v X, its p.d.f represents complete information about it, and for any Borel set B on the.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 04: GAUSSIAN CLASSIFIERS Objectives: Whitening.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
The Effect of Database Size Distribution on Resource Selection Algorithms Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University.
Relevant Document Distribution Estimation Method for Resource Selection Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.
Lecture 1.31 Criteria for optimal reception of radio signals.
FRM: Modeling Sponsored Search Log with Full Relational Model
LECTURE 05: THRESHOLD DECODING
7. Two Random Variables In many experiments, the observations are expressible not as a single quantity, but as a family of quantities. For example to record.
LECTURE 05: THRESHOLD DECODING
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
LECTURE 05: THRESHOLD DECODING
Click Chain Model in Web Search
7. Two Random Variables In many experiments, the observations are expressible not as a single quantity, but as a family of quantities. For example to record.
Efficient Multiple-Click Models in Web Search
7. Two Random Variables In many experiments, the observations are expressible not as a single quantity, but as a family of quantities. For example to record.
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at CIKM 2009.

Index Background. A Simple Click Model. – Dependent click model [WSDM09]. Advanced Design. – Five extension directions. Advanced Estimation. – Bayesian framework and the rationale. – Bayesian browsing model (BBM) [Liu09]. – Click chain model (CCM) [Guo09]. Course Project. 2

Scenario: Web Search 3

User Click Log

Eye-tracking User Study Users have bias to examine the top results. 5

Position-bias Identification 6 Higher positions receive more user attention (eye fixation) and clicks than lower positions. This is true even in the extreme setting where the order of positions is reversed. “Clicks are informative but biased”. Normal Position Percentage Reversed Impression Percentage [ Joachims07 ]

Answer to Previous Example Result 5 is more relevant compared with Result 1. Because Result 5 has less opportunity to be examined

Click Model Motivation Modeling the user’s click behavior in an interpreted manner and estimate the pure relevance of a query-document/ad pair regardless of bias. – Position-bias is the main problem. – Other kinds of bias. Influence among documents/ads Attractiveness bias Search intent bias … Pure relevance of a query-document/ad pair intuition. – When the query is submitted to the search engine and only one single document/ad is shown, what is the click-through rate of this query- document/ad pair? 8

Examination Hypothesis [Richardson07] A document must be examined before a click. The probability of click conditioned on being examined depends on the pure relevance of the query-document/ad pair. The click probability could be decomposed. – Global component. the examination probability which reflects the position-bias. – Local component (pure relevance). click probability of the (query, URL) pair conditioned on being examined. 9

Click Models Key tasks. – How to design the user examination behavior? – How to estimate the relevance of a query-doc/ad pair? Desired Properties. – Effective: aware of the position-bias/other-bias and address it properly. – Scalable: linear complexity for both time and space, easy to parallel. – Incremental: flexible for model update based on new data. 10 From this slide, “relevance” is equal to “pure relevance”.

Importance of Understanding Logs Better matching query and documents/ads. All the participants would benefit. – Users: better relevance. – Search engines: more revenue from advertisers and more users. – Advertisers: more return on investment (ROI). 11 Advertiser User Publisher Better Match

Growth of Web Users 12

Growth of Web Revenue 13

Index Background. A Simple Click Model. – Dependent Click Model [WSDM09]. Advanced Design. Advanced Estimation. Projects. 14

Notations – E i binary r.v. for Examination Event on position i; – C i binary r.v. for Click Event on position i; – r i = p(C i = 1| E i = 1) relevance for the query-document pair on position i. 15

Click Model Design 16 Dependent Click Model (DCM) [GUO09]

Parameters in DCM r=p(C=1|E=1) is local parameter. – Modeling the relevance of a query-document/ad pair. – The position-bias has been modeled by p(E=1). λ is global parameter. – Modeling p(E i+1 =1|C i =1,E i =1). 17 Parameters estimation Maximum log-likelihood method

Estimation of r: Step 1 Define as last click position. When there is no click, is the last position. 18 Query cikm PosURLClick 1cikm2008.org0 2www.cikm.org1 3www.fc.ul.pt/cikm0 4cikmconf.org0 5www.cikm.com/...1 6Ir.iit.edu/cikm20040 Query cikm PosURLClick 1cikm2008.org0 2www.cikm.org0 3www.fc.ul.pt/cikm0 4cikmconf.org0 5www.cikm.com/...0 6Ir.iit.edu/cikm20040

Estimation of r: Step 2 Log-likelihood of a query session. 19

Estimation of r: Step 3 By maximizing the lower bound of the log-likelihood, we have 20 Suppose the current pair has occurred in different sessions. For M sessions, it occurs before/on l and has been clicked; for N sessions, it occurs before/on l and is not clicked.

Estimation of λ For a specific, By maximizing the lower bound of the log- likelihood, we have 21 Suppose there are totally A sessions. In B sessions, the position l is large than position i and click event happens in position i. In C sessions, the position l is just equal to position i. Other cases happen in the other A-B-C sessions.

Property Verification Effective. Scalable and Incremental. 22

Evaluation Criteria for DCM Log-likelihood. – Given the document impression in the test set. – Compute the chance to recover the entire click vector. – Averaged over different query sessions. 23

Experimental Result for DCM 24

Some Other Evaluations Log-likelihood. – Perplexity. – Root mean square error (RMSE). – Area under ROC curve. –

Index Background. A Simple Click Model. Advanced Design. – Five extension directions. Advanced Estimation. Project. 26

1 Dependency from Previous Docs/Ads For position 4 in the following two cases, do they have the same chance to be examined? Intuitively, the left one has less chance, since user may find the URL he/she wants in position 2 and stops the session. 27 Query cikm PosURLClick 1cikm2008.org0 2www.cikm.org1 3www.fc.ul.pt/cikm0 4cikmconf.org0 5www.cikm.com/...0 6Ir.iit.edu/cikm20040 Query cikm PosURLClick 1cikm2008.org0 2www.cikm.org0 3www.fc.ul.pt/cikm0 4cikmconf.org1 5www.cikm.com/...0 6Ir.iit.edu/cikm20040

Solution: Click Chain Model [Guo09] The chance of being examined depend on the relevance of previous documents/ads. Other similar work includes [Dupret08][Liu09]. 28

2 Perceived v.s. Actual Relevance After clicking the docs/ads, the actual relevance, by judging from the landing page, might be different from user’s perceived relevance. 29 Pizza Query Ad1 Ad2 before examination after examination

Solution: Dynamic Bayesian Network [Chapelle09] For each ad, two kinds of relevance are defined, perceived relevance r and actual relevance s. s would influence the examination probability of the latter docs/ads. 30

3 Aggregate v.s. Instance Relevance Users might have different intents for the same query. The click event could indicate the intent. 31 Aggregate search. E.g., learn the parameters Instance search. E.g., buy a camera Canon Query Ad1 Ad2 Canon Ad1 Ad2 Canon Ad1 Ad2

Solution: Joint Relevance Examination Model [Srikant10] Add a correction factor, which is determined by the click events of other docs/ads. Other similar work includes [Hu11]. 32

4 Competing Influence in Docs/Ads When co-occurred with a high-relevant doc/ad, the perceived relevance of the current doc/ad would be decreased. 33

Solution: Temporal Click Model [Xu10] The docs/ads are competed to win the priority to be examined. 34

5 Incorporating Features Feature example: dwelling time. 35

Solution: Post-Clicked Click Model [Zhong 10] Incorporating features to determine the relevance. Other similar work include [Zhu 10]. 36

Index Background. A Simple Click Model. Advanced Design. Advanced Estimation. – Bayesian framework and the rationale. – Bayesian browsing model. – Click chain model. Project. 37

Limitation of Maximum Log-likelihood Cannot fit the scalable and incremental properties. – It has difficulty in getting closed-form formula, when the model is complex. – Even in DCM as shown in this page, we need to approximate a lower bound for easy calculation. No prior information could be utilized in such sparse data environment. 38 Log-likelihood of DCM

An Coin-Toss Example for Bayesian Framework Scenario: to estimate the probability of tossing a head according to the following five training samples. The probability is a variable X = x. Each training sample is denoted by C i, e.g., C 1 = 1, C 4 =0. According to Bayesian rule, we have 39

Bayesian Estimation of Coin-tossing 40 X C1C1 C2C2 C3C3 C4C4 C5C5 Bayesian rule: Uniform prior: Independent sampling : Distribution : Estimation:

Density Function Update of Coin- tossing 41 Prior Posterior Density Function (not normalized) x 1 (1-x) 0 x 2 (1-x) 0 x 3 (1-x) 0 x 3 (1-x) 1 x 4 (1-x) 1

Click Data Scenario 42 a b c d a c e a b a c b a f g query Bayesian rule: Uniform prior: Independent sampling : Distribution :

Factor Trick If the factors of p(C|X) are arbitrary, for each training sample, a unique factor of p(X) must be stored. Thus it is space consuming; However if the factors of p(C|X) are from a small discrete set, only the exponents are needed to be stored. 43 Distribution :

Updating Example 44 Prior Density Function (not normalized) x 1 (1-x) 0 (1-0.6x) 0 (1+0.3x) 1 (1-0.5x) 0 (1- 0.2x) 0 … x 1 (1-x) 1 (1-0.6x) 0 (1+0.3x) 1 (1-0.5x) 0 (1- 0.2x) 0 … x 2 (1-x) 1 (1-0.6x) 0 (1+0.3x) 2 (1-0.5x) 0 (1- 0.2x) 0 … x 3 (1-x) 1 (1-0.6x) 1 (1+0.3x) 2 (1-0.5x) 0 (1- 0.2x) 0 … x 3 (1-x) 1 (1-0.6x) 1 (1+0.3x) 2 (1-0.5x) 1 (1- 0.2x) 0 …

How to realize the factor trick? Setting a global parameter for all cases. – Bayesian browsing model (BBM) [Liu09]. Assuming all other docs/ads follows the same distribution and integrating them. – Click chain model (CCM) [Guo09]. 45 In the following two example, we only concern the estimation of r using Bayesian framework. The estimation of other parameters are all based on maximizing the log-likelihood similarly as shown in DCM. Please refer the original paper for details.

Index Background. A Simple Click Model. Advanced Design. Advanced Estimation. – Bayesian framework and the rationale. – Bayesian browsing model. – Click chain model. Project. 46

BBM Variable Definition 47 For a specific query session, let – r i, the relevance variable at position i. – E i, the binary examination variable at position i. – C i, the binary click variable at position i. – n i, last click position before position i. – d i, the distance between position i and its previous clicked position.

Small Discrete Set of Beta Suppose M = 3 for simplicity illustration. There are only 6 values of beta. 48 n=0 d=1 n=0 d=2 n=0 d=3 n=1 d=1 n=1 d=2 n=2 d=1

Estimation Algorithms 49 How many times the Doc/ad was clicked How many times the Doc/ad was not clicked with the probability of beta n,d

Toy Example Step 1 50 Only top M=3 positions are shown, 3 query sessions and 4 distinct URLs Position Query Session 3 Query Session 2 Query Session 1

Toy Example Step 2 51 Initialize M(M+1)/2+1 counts for each URL. URL Clicks n=0 d=1 n=0 d=2 n=0 d=3 n=1 d=1 n=1 d=2 n=2 d=

Toy Example Step 3 52 Update counts for URL 4. – If not impressed, do nothing; – If clicked, increment “clicks” by 1; – Otherwise, locate the right r and d to increment. URL Clicks n=0 d=1 n=0 d=2 n=0 d=3 n=1 d=1 n=1 d=2 n=2 d=

Toy Example Step 4 53 Update counts for URL 4. – If not impressed, do nothing; – If clicked, increment “clicks” by 1; – Otherwise, locate the right r and d to increment. URL Clicks n=0 d=1 n=0 d=2 n=0 d=3 n=1 d=1 n=1 d=2 n=2 d=

Toy Example Step 5 54 Update counts for URL 4. – If not impressed, do nothing; – If clicked, increment “clicks” by 1; – Otherwise, locate the right r and d to increment. URL Clicks n=0 d=1 n=0 d=2 n=0 d=3 n=1 d=1 n=1 d=2 n=2 d=

Toy Example Step 6 55 The posterior for URL 4. Interpretation: – The larger the probability of examination, the stronger the penalty for a non-click. URL Clicks n=0 d=1 n=0 d=2 n=0 d=3 n=1 d=1 n=1 d=2 n=2 d=

Algorithm Complexities 56 Let Initializing and updating the counts: – Time: Space: Linear to the size of the click log Almost constant storage required

Index Background. A Simple Click Model. Advanced Design. Advanced Estimation. – Bayesian framework and the rationale. – Bayesian browsing model. – Click chain model. Project. 57

User Behavior Description 58 Examine the Document Click? See Next Doc? Done No Yes No Yes See Next Doc? Done No

Estimation Algorithms By assuming other docs/ads in a session follow the same distribution and integrate them, the factors f p(C|R) could be described from a small discrete set. 59

Five Cases The current doc/ad may occur in five different cases. For each case, there would be unique factors for p(C|R i ). 60

Case 1 61 The doc/ad must be examined. Other R can seen as constants.

Case 2 62

Case 3 63

All Cases 64 By assuming other docs/ads in a session follows the same distribution and integrate them, the factors f p(C|R) could be described from a small discrete set.

Index Background. A Simple Click Model. Advanced Design. Advanced Estimation. Project. 65

Description Fake dataset. Format. – queryId – ad1Id, click – ad2Id, click – ad3Id, click Evaluation Metric: ROC. Baseline. – Average (Avg). Current competitive method. – Simplified CCM (SCCM). Task. – Implement another advanced click model. – Compare the result with the Avg and SCCM. – Analyzing the reasons of improvement. 66

End 67