Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,

Slides:



Advertisements
Similar presentations
Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.
Advertisements

1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Language Models Hongning Wang
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
CpSc 881: Information Retrieval
A Maximum Coherence Model for Dictionary-based Cross-language Information Retrieval Yi Liu, Rong Jin, Joyce Y. Chai Dept. of Computer Science and Engineering.
Information Retrieval Models: Probabilistic Models
Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.
Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.
Chapter 7 Retrieval Models.
1 Language Model CSC4170 Web Intelligence and Social Computing Tutorial 8 Tutor: Tom Chao Zhou
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University
Incorporating Language Modeling into the Inference Network Retrieval Framework Don Metzler.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University.
Formal Multinomial and Multiple- Bernoulli Language Models Don Metzler.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Computer vision: models, learning and inference
Scalable Text Mining with Sparse Generative Models
Language Modeling Approaches for Information Retrieval Rong Jin.
Chapter Two Probability Distributions: Discrete Variables
1 Probabilistic Language-Model Based Document Retrieval.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
Bayesian Model Selection in Factorial Designs Seminal work is by Box and Meyer Seminal work is by Box and Meyer Intuitive formulation and analytical approach,
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
Information Retrieval and Web Search Text properties (Note: some of the slides in this set have been adapted from the course taught by Prof. James Allan.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Context-Sensitive Information Retrieval Using Implicit Feedback Xuehua Shen : department of Computer Science University of Illinois at Urbana-Champaign.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture model Based on Minimum Message Length by Nizar Bouguila.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Positional Relevance Model for Pseudo–Relevance Feedback Yuanhua Lv & ChengXiang Zhai Department of Computer Science, UIUC Presented by Bo Man 2014/11/18.
Information Retrieval Lecture 4 Introduction to Information Retrieval (Manning et al. 2007) Chapter 13 For the MSc Computer Science Programme Dell Zhang.
Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,
Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
Gaussian Processes For Regression, Classification, and Prediction.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Using Social Annotations to Improve Language Model for Information Retrieval Shengliang Xu, Shenghua Bao, Yong Yu Shanghai Jiao Tong University Yunbo Cao.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
Indri at TREC 2004: UMass Terabyte Track Overview Don Metzler University of Massachusetts, Amherst.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.
The Effect of Database Size Distribution on Resource Selection Algorithms Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University.
A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie.
Introduction to Information Retrieval Probabilistic Information Retrieval Chapter 11 1.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 14: Language Models for IR.
A Formal Study of Information Retrieval Heuristics
CSCI 5417 Information Retrieval Systems Jim Martin
Information Retrieval Models: Probabilistic Models
CSCI 5822 Probabilistic Models of Human and Machine Learning
Language Models for Information Retrieval
Introduction to Statistical Modeling
John Lafferty, Chengxiang Zhai School of Computer Science
Topic Models in Text Processing
Parametric Methods Berlin Chen, 2005 References:
CS590I: Information Retrieval
INF 141: Information Retrieval
Language Models for TR Rong Jin
Presentation transcript:

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge, U.K. University of Twente The Netherlands ACM SIGIR 2003 Session: Retrieval Models

Abstract Many smoothed estimators (including Laplace and Bayes-smoothing) are approximations to the Bayesian predictive distribution Derive the full predictive-distribution in a form amenable to implementation by classical IR models, and compare it to other estimators. The proposed model outperforms Bayes- smoothing, and its combination with linear interpolation smoothing outperforms all other estimators.

Introduction (1/2) Language Model –Computes the relevance of a document d with respect to a query q by estimating a factorized form of the distribution P(q, d) Bayesian statistics –useful concepts and tools for estimation –powerful mathematical framework for data modeling when the data is scarce and/or uncertain

Introduction (2/2) Bayes-smoothing or Dirichlet smoothing –The best smoothing techniques used today in Language Model –an approximation to the full Bayesian inference model: in face, it ’ s the maximum poster approximation to the predictive distribution In this paper –we derive analytically the predictive distribution of the most commonly used query Language Model

The Unigram Query Model (1/4) Unigram Query Model –Consider a query q and a doc collection of N docs C:={d l } l=1,…,N –q i : # of times the term i appears in the query –V: the size of the vocabulary

The Unigram Query Model (2/4) –Consider a multinomial generation model for each doc, parameterized by the vector –Length of a query (n q ) and a doc (n l ) The sum of their components (e.g. ) –The probability of generating a particular query q with counts q and doc d l

The Unigram Query Model (3/4) –The unigram query model postulates that the relevance of a document to a query can be measured by the probability that the query is generated by the document –By this it is meant the likelihood of the query P(q| θ l ) when the parameter θ l are estimated using d l as a sample of the underlying distribution –The central problem of this model is then the estimation of the parameters θ l,i from the document counts d l, the collection counts {cf i :=Σd l,i } i=1..V and the size of the collection N

The Unigram Query Model (4/4) Given an infinite amount of data –empirical estimates (maximum likelihood estimate) –Little data for the estimation of these parameters, the empirical estimator is not good. unseen words –Two smoothing techniques Maximum-posterior estimator Linearly-interpolated maximum likelihood estimator

Bayesian Language Model (1/4) Bayesian techniques –rather than find a single point estimate for the parameter vector θ l, a distribution over θ l (posterior) is obtained by Bayes’ rule –Predictive distribution (where assume doc and query are generated by the same distribution)

Bayesian Language Model (2/4) Prior probability, P(θ) –It ’ s central to Bayesian inference, especially for small data samples –In most cases, the only available choice for a prior is the natural conjugate of the generating distribution The natural conjugate of a multinomial distribution is the Dirichlet distribution

Bayesian Language Model (3/4) Under this prior –the posterior distribution is Dirichlet as well –the predictive distribution

Bayesian Language Model (4/4) New Document Scoring function where the last two terms can be dropped as they are document independent

Setting the hyper-parameter values (α i ) One is to set all the α i to some constant A better option is to fit the prior distribution to the collection statistics –Average term count term t i is proportional to –The mean of posterior distribution P(q| θ l ) is known to be: –Setting this mean to be equal to the average term count –Therefore, setting α i =μP(v i |C) and n α =μ where μ is a free parameter in this model

Relationship to Other Smoothing Models (1/3) Maximum Posterior (MP) distribution –A standard approximation to the Bayesian predictive distribution For a Dirichelet prior –α i =1, obtain the maximum likelihood estimator –α i =2 or α i =λ+1, obtain Laplace smoothing estimator

Relationship to Other Smoothing Models (2/3) –α i =μP(v i |C), obtain the Bayesian-smoothing estimator –Linear interpolation (LI) smoothing Scoring function resulting from these estimator (BS, LI), rewrite the unigram query model β l : doc dependant constant

Relationship to Other Smoothing Models (3/3) General formulation of the unigram query model –A fast inverted index can be used to retrieve the weights needed to compute the first term –# of operations to compute the first term depends only on # of term-indices matching –the cost of computing the second term is negligeable Bayesian predictive model propose in this paper –# of operations to compute the first term is different –the last term cannot be pre-computed –Slightly more expensive, but also can be implemented in a real scale IR system

Empirical Evaluation (1/3) Data –TREC-8 document collection –TREC-6 and TREC-8 queries and query-relevance sets Data Pre-processing is standard –Terms was stemmed by Porter stemmer –Stop words and words fewer than 3 times are removed –Query are constructed from the title and description

Empirical Evaluation (2/3) Results –Bayes predictive model the optimal parameter setting is roughly the same –Linear interpolation smoothing yields better results than Bayes- smoothing and the Bayes predictive model

Empirical Evaluation (3/3) Combination of Bayes predictive model and linear interpolation smoothing and Bayes-smoothing

Conclusion Present a first Bayesian analysis of the unigram query Language Model for ad hoc retrieval, and propose a new scoring function derived from the Bayesian predictive distribution Work remains to be done –Combine these two approaches –Automatically adapt the μ scaling parameter –Bayesian inference framework could be applied to other Language Model and extend to other tasks such as relevance feedback, query expansion and adaptive filtering