Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,

Similar presentations


Presentation on theme: "Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,"— Presentation transcript:

1 Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge, U.K. University of Twente The Netherlands ACM SIGIR 2003 Session: Retrieval Models

2 Abstract Many smoothed estimators (including Laplace and Bayes-smoothing) are approximations to the Bayesian predictive distribution Derive the full predictive-distribution in a form amenable to implementation by classical IR models, and compare it to other estimators. The proposed model outperforms Bayes- smoothing, and its combination with linear interpolation smoothing outperforms all other estimators.

3 Introduction (1/2) Language Model –Computes the relevance of a document d with respect to a query q by estimating a factorized form of the distribution P(q, d) Bayesian statistics –useful concepts and tools for estimation –powerful mathematical framework for data modeling when the data is scarce and/or uncertain

4 Introduction (2/2) Bayes-smoothing or Dirichlet smoothing –The best smoothing techniques used today in Language Model –an approximation to the full Bayesian inference model: in face, it ’ s the maximum poster approximation to the predictive distribution In this paper –we derive analytically the predictive distribution of the most commonly used query Language Model

5 The Unigram Query Model (1/4) Unigram Query Model –Consider a query q and a doc collection of N docs C:={d l } l=1,…,N –q i : # of times the term i appears in the query –V: the size of the vocabulary

6 The Unigram Query Model (2/4) –Consider a multinomial generation model for each doc, parameterized by the vector –Length of a query (n q ) and a doc (n l ) The sum of their components (e.g. ) –The probability of generating a particular query q with counts q and doc d l

7 The Unigram Query Model (3/4) –The unigram query model postulates that the relevance of a document to a query can be measured by the probability that the query is generated by the document –By this it is meant the likelihood of the query P(q| θ l ) when the parameter θ l are estimated using d l as a sample of the underlying distribution –The central problem of this model is then the estimation of the parameters θ l,i from the document counts d l, the collection counts {cf i :=Σd l,i } i=1..V and the size of the collection N

8 The Unigram Query Model (4/4) Given an infinite amount of data –empirical estimates (maximum likelihood estimate) –Little data for the estimation of these parameters, the empirical estimator is not good. unseen words –Two smoothing techniques Maximum-posterior estimator Linearly-interpolated maximum likelihood estimator

9 Bayesian Language Model (1/4) Bayesian techniques –rather than find a single point estimate for the parameter vector θ l, a distribution over θ l (posterior) is obtained by Bayes’ rule –Predictive distribution (where assume doc and query are generated by the same distribution)

10 Bayesian Language Model (2/4) Prior probability, P(θ) –It ’ s central to Bayesian inference, especially for small data samples –In most cases, the only available choice for a prior is the natural conjugate of the generating distribution The natural conjugate of a multinomial distribution is the Dirichlet distribution

11 Bayesian Language Model (3/4) Under this prior –the posterior distribution is Dirichlet as well –the predictive distribution

12 Bayesian Language Model (4/4) New Document Scoring function where the last two terms can be dropped as they are document independent

13 Setting the hyper-parameter values (α i ) One is to set all the α i to some constant A better option is to fit the prior distribution to the collection statistics –Average term count term t i is proportional to –The mean of posterior distribution P(q| θ l ) is known to be: –Setting this mean to be equal to the average term count –Therefore, setting α i =μP(v i |C) and n α =μ where μ is a free parameter in this model

14 Relationship to Other Smoothing Models (1/3) Maximum Posterior (MP) distribution –A standard approximation to the Bayesian predictive distribution For a Dirichelet prior –α i =1, obtain the maximum likelihood estimator –α i =2 or α i =λ+1, obtain Laplace smoothing estimator

15 Relationship to Other Smoothing Models (2/3) –α i =μP(v i |C), obtain the Bayesian-smoothing estimator –Linear interpolation (LI) smoothing Scoring function resulting from these estimator (BS, LI), rewrite the unigram query model β l : doc dependant constant

16 Relationship to Other Smoothing Models (3/3) General formulation of the unigram query model –A fast inverted index can be used to retrieve the weights needed to compute the first term –# of operations to compute the first term depends only on # of term-indices matching –the cost of computing the second term is negligeable Bayesian predictive model propose in this paper –# of operations to compute the first term is different –the last term cannot be pre-computed –Slightly more expensive, but also can be implemented in a real scale IR system

17 Empirical Evaluation (1/3) Data –TREC-8 document collection –TREC-6 and TREC-8 queries and query-relevance sets Data Pre-processing is standard –Terms was stemmed by Porter stemmer –Stop words and words fewer than 3 times are removed –Query are constructed from the title and description

18 Empirical Evaluation (2/3) Results –Bayes predictive model the optimal parameter setting is roughly the same –Linear interpolation smoothing yields better results than Bayes- smoothing and the Bayes predictive model

19 Empirical Evaluation (3/3) Combination of Bayes predictive model and linear interpolation smoothing and Bayes-smoothing

20 Conclusion Present a first Bayesian analysis of the unigram query Language Model for ad hoc retrieval, and propose a new scoring function derived from the Bayesian predictive distribution Work remains to be done –Combine these two approaches –Automatically adapt the μ scaling parameter –Bayesian inference framework could be applied to other Language Model and extend to other tasks such as relevance feedback, query expansion and adaptive filtering


Download ppt "Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,"

Similar presentations


Ads by Google