Presentation on theme: "Modelling Relevance and User Behaviour in Sponsored Search using Click-Data Adarsh Prasad, IIT Delhi Advisors: Dinesh Govindaraj SVN Vishwanathan* Group:"— Presentation transcript:
Modelling Relevance and User Behaviour in Sponsored Search using Click-Data Adarsh Prasad, IIT Delhi Advisors: Dinesh Govindaraj SVN Vishwanathan* Group: Revenue and Relevance *-Visiting Researcher from Purdue
Overview Click-Data seems to be the perfect source of information when deciding which Ads to show in answer to a query. It can be thought as the result of users voting in favour of the documents they find interesting. This information can be fed into the ranker, to tune search parameters or even use as training points as for the ranker. The aim of the project is to develop a model which takes in Click-Data and generates output in the form of constraints or updated ranking score as input to the ranker. 2
Quality of training points is of critical importance for learning a ranking function Currently, labeled data collected using human judges. Human-labeling is time-consuming and labor-intensive. Need to ensure “temporal relevance” of Ads i.e. Something relevant today might not be relevant 6 months later, therefore labeling must be repeated and there is a need for automation of labeling process Motivation Main Difficulty – Presentation Bias Results at lower positions are less likely to be clicked even if they are relevant.(Position) Clicks depend on other Ads being shown.(Externalities) Main Difficulty – Presentation Bias Results at lower positions are less likely to be clicked even if they are relevant.(Position) Clicks depend on other Ads being shown.(Externalities)  Oliver Chapelle et al. A Dynamic Bayesian Click Model for Web Search Ranking Example  Query: myspace URL = Market = U.K. Ranking 1 Pos 1: uk.myspace.com: ctr = 0.97 Pos 2: ctr = 0.11www.myspace.com Ranking 2: Pos 1 : : ctr = 0.97www.myspace.com 3
Procedure Use of Click Data as target : Useful for markets with few editorial Judgments. Train on pairwise preferences: Two Sets of preferences: P E from editorial judgments and P C coming from click modeling. Minimize: For learning a web search function, clicks can be used as a target  or as a feature  Target 1.Deriving Preference Relations on the basis of click-pattern and feeding them as constraints to ranker (Rocky-Road) Position and Order-of-Click based Constraints  Aggregate Constraints Target 1.Deriving Preference Relations on the basis of click-pattern and feeding them as constraints to ranker (Rocky-Road) Position and Order-of-Click based Constraints  Aggregate Constraints Feature 1.Sample Clicked Ads and label them as relevant. 2.Types of Sampling: Random Position based Weighted : User Clicking ml-4 Ad stronger signal of relevance as compared to user clicking ml-1 3.Feed them to the Binary Classifier Feature 1.Sample Clicked Ads and label them as relevant. 2.Types of Sampling: Random Position based Weighted : User Clicking ml-4 Ad stronger signal of relevance as compared to user clicking ml-1 3.Feed them to the Binary Classifier  Joachims et al. Optimizing Search Engines using Clickthrough Data  Agichtein et al. Improving web search ranking via incorporating User Behaviour  Joachims et al. Accurately interpreting ClickThrough Data as Implicit Feedback 4
Results 5 EXACTMATCHBROADMATCHPHRASEMATCHSMARTMATCH Sampling+0.39%+1.02% Position and Order Constraints +1.22%+5.93%+4.15%+0.38% Aggregate Constraints +0.2%+5.17%+0.77%+0.5% SAME SUPERSETDISJOINT Sampling+5.72%+4.22% Position and Order Constraints +3.1%+2.28% Aggregate Constraints +7.4%+5.28% -6.28% -3.9% -11.3% -0.06%-0.5% Log Loss (Label Based) Sampling+0.001% Position and Order Constraints +3.07% Aggregate Constraints +1.75% Weighted LL
Background on Click Models Use CTR (click-through rate) data. Pr(click) = Pr(examination) x Pr(click | examination) Need user browsing models to estimate Pr(examination) Relevance 6
Notation Φ(i) : result at position i Examination event: Click event: 7
Examination Hypothesis Richardson et al, WWW 2007: Pr(C i = 1) = Pr(E i = 1) Pr(C i = 1 | E i = 1) α i : position bias Depends solely on position. Can be estimated by looking at CTR of the same result in different positions. 8
Examination depends on prior clicks Cascade model Dependent click model (DCM) User browsing model (UBM) [Dupret & Piwowarski, SIGIR 2008] More general and more accurate than Cascade, DCM. Conditions Pr(examination) on closest prior click. Bayesian browsing model (BBM) [Liu et al, KDD 2009] Same user behavior model as UBM. Uses Bayesian paradigm for relevance. 10
Use position of closest prior click to predict Pr(examination). Pr(E i = 1 | C 1:i-1 ) = α i β i,p(i) Pr(C i = 1 | C 1:i-1 ) = Pr(E i = 1 | C 1:i-1 ) Pr(C i = 1 | E i = 1) User browsing model (UBM) 11 position bias p(i) = position of closest prior click Prior clicks don’t affect relevance.
Other Related Work Examination depends on prior clicks and prior relevance Click chain model (CCM) General click model (GCM) Post-click models Dynamic Bayesian model Session utility model 12
User Browsing in Sponsored Search 13 Is user browsing in sponsored search similar to browsing in Web Search?? Generally, the assumption in organic search is that users examine and click in a linear top-to-bottom fashion. We observed that for sponsored search where the number of returned results is few, a fair share (~ 30%) of users click out of order. Users behaving in a non-linear fashion is a strong signal, which may contain important information. Combining position and temporal behavior of user. The statistic(x) that has been counted is the difference between the positions of temporal clicks. Example: if the user clicks on ml1 and then ml2 then x = -1 if ml2 and then ml1 then x=1 and so on.
A New Model Allow users to move in a non-linear fashion Also, incorporate the notion of externalities, i.e. perceived relevance changes with other clicks. 14 For learning our parameters, we can use EM Algorithm. (1)In E step, we estimate our hidden parameters by a forward-backward algorithm. (2)In M step- We have closed form solutions to maximize the expected log-likelihood.