A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie.

Slides:



Advertisements
Similar presentations
Statistical Translation Language Model Maryam Karimzadehgan University of Illinois at Urbana-Champaign 1.
Advertisements

1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,
Language Models Hongning Wang
Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,
Cumulative Progress in Language Models for Information Retrieval Antti Puurula 6/12/2013 Australasian Language Technology Workshop University of Waikato.
Probabilistic Ranking Principle
Information Retrieval Models: Probabilistic Models
1 Smoothing Methods for LM in IR Alejandro Figueroa.
Chapter 7 Retrieval Models.
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University
Incorporating Language Modeling into the Inference Network Retrieval Framework Don Metzler.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 12: Language Models for IR.
Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University.
Language Modeling Frameworks for Information Retrieval John Lafferty School of Computer Science Carnegie Mellon University.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Language Modeling Approaches for Information Retrieval Rong Jin.
The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.
Chapter 7 Retrieval Models.
Generating Impact-Based Summaries for Scientific Literature Qiaozhu Mei, ChengXiang Zhai University of Illinois at Urbana-Champaign 1.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
Language Models for IR Debapriyo Majumdar Information Retrieval Indian Statistical Institute Kolkata Spring 2015 Credit for several slides to Jimmy Lin.
Language Models Hongning Wang Two-stage smoothing [Zhai & Lafferty 02] c(w,d) |d| P(w|d) = +  p(w|C) ++ Stage-1 -Explain unseen words -Dirichlet.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
A General Optimization Framework for Smoothing Language Models on Graph Structures Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Language Models Hongning Wang Recap: document generation model 4501: Information Retrieval Model of relevant docs for Q Model of non-relevant.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Positional Relevance Model for Pseudo–Relevance Feedback Yuanhua Lv & ChengXiang Zhai Department of Computer Science, UIUC Presented by Bo Man 2014/11/18.
Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval Rui Yan†, ♮, Han Jiang†, ♮, Mirella Lapata‡,
Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
1 A Formal Study of Information Retrieval Heuristics Hui Fang, Tao Tao and ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.
Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,
Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.
Jen-Tzung Chien, Meng-Sung Wu Minimum Rank Error Language Modeling.
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
CpSc 881: Information Retrieval. 2 Using language models (LMs) for IR ❶ LM = language model ❷ We view the document as a generative model that generates.
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
Relevance Feedback Hongning Wang
A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval Min Zhang, Xinyao Ye Tsinghua University SIGIR
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
The Effect of Database Size Distribution on Resource Selection Algorithms Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University.
A Study of Poisson Query Generation Model for Information Retrieval
SIGIR 2005 Relevance Information: A Loss of Entropy but a Gain for IDF? Arjen P. de Vries Thomas Roelleke,
Context-Sensitive IR using Implicit Feedback Xuehua Shen, Bin Tan, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Relevant Document Distribution Estimation Method for Resource Selection Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University
2010 © University of Michigan Probabilistic Models in Information Retrieval SI650: Information Retrieval Winter 2010 School of Information University of.
Language-model-based similarity on large texts Tolga Çekiç /10.
A Formal Study of Information Retrieval Heuristics Hui Fang, Tao Tao, ChengXiang Zhai University of Illinois at Urbana Champaign Urbana SIGIR 2004 Presented.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 14: Language Models for IR.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
Information Retrieval Models: Language Models
A Formal Study of Information Retrieval Heuristics
Lecture 13: Language Models for IR
Statistical Language Models
Language Models for Text Retrieval
Information Retrieval Models: Probabilistic Models
Language Models for Information Retrieval
Introduction to Statistical Modeling
John Lafferty, Chengxiang Zhai School of Computer Science
Language Model Approach to IR
Language Models Hongning Wang
CS590I: Information Retrieval
INF 141: Information Retrieval
Language Models for TR Rong Jin
Presentation transcript:

A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie Mellon University

Research Questions General: What role is smoothing playing in the language modeling approach? Specific: – Is the good performance due to smoothing? – How sensitive is retrieval performance to smoothing? – Which smoothing method is the best? – How do we set smoothing parameters?

Outline A General Smoothing Scheme and TF-IDF weighting Three Smoothing Methods Experiments and Results

Retrieval as Language Model Estimation Document ranking based on query likelihood (Ponte & Croft 98, Miller et al. 99, Berger & Lafferty 99, Hiemstra 2000, etc.) Retrieval problem  Estimation of p(w i |d) Document language model

Why Smoothing? Zero probability – If w does not occur in d, then p(w|d) =0, and any query with word w will have a zero probability. Estimation inaccuracy – A document is a very small sample of words, and the maximum likelihood estimate will be inaccurate.

Language Model Smoothing (Illustration) P(w) w Max. Likelihood Estimate Smoothed LM (linear interpolation)

A General Smoothing Scheme All smoothing methods try to – discount the probability of words seen in a document – re-allocate the extra probability so that unseen words will have a non-zero probability Most use a reference model (collection language model) to discriminate unseen words Discounted ML estimate Collection language model

Smoothing & TF-IDF Weighting Plug in the general smoothing scheme to the query likelihood retrieval formula, we obtain Ignore for ranking IDF weighting TF weighting Doc length normalization (long doc is expected to have a smaller  d ) Smoothing with p(w|C)  TF-IDF + length norm.

Three Smoothing Methods Simplified Jelinek-Mercer: Shrink uniformly toward p(w|C) Dirichlet prior (Bayesian): Assume pseudo counts  p(w|C) Absolute discounting: Subtract a constant 

Experiments FBIS LA FT TREC8 Small WEB (~2GB) Disk4 & 5 -CR (~2GB) Collections Queries TREC 351 – 400 (Title + Long) TREC 401 – 450 (Title + Long) 18 combinations

Results Performance is sensitive to smoothing Type of queries makes a difference! – More smoothing is needed for long queries than title queries – Precision is more sensitive to smoothing for long queries – Dirichlet prior is the best for title queries – Jelinek-Mercer is most sensitive to the length/type of queries

Figure Explanation Optimal parameter settings Optimal range Smoothing parameter (e.g.,, , or  ) Avg. precision 1.0 More smoothing

Title queries vs. Long queries (Jelinek-Mercer on FBIS, FT, and LA) Optimal Title query Optimal Long query

Per-query Optimal range of (JM on Trec8) Title queries Long queries wide range flat curve less sensitive narrow range peaked curve more sensitive

Dirichlet Prior Absolute Discounting More on Precision Sensitivity Small DB Large DB optimal more smoothing flatter

Comparison of Three Methods

A Possible Explanation of Observations The Dual Role of Smoothing – Estimation role: Accurate estimation of p(w|d) – Query modeling role: Generation of common/non- informative words in query Title queries have no (few) non-informative words, so – Performance is affected primarily by the estimation role of smoothing only – They need less smoothing

A Possible Explanation (cont.) Long queries have more non-informative words, so – Performance is affected by both roles of smoothing – They need more smoothing (extra smoothing is for query modeling) Dirichlet is best for title queries, because it is good for playing the estimation role JM performs not so well on title queries, but much better on long queries, because it is good for playing the query modeling role, but not so good for the estimation role.

The Lemur Toolkit Language Modeling and Information Retrieval Toolkit Under development at CMU and Umass All experiments reported here were run using Lemur Contact us if you are interested in using it

Conclusions and Future Work Smoothing  TF-IDF + doc length normalization Retrieval performance is sensitive to smoothing Sensitivity depends on query type – More sensitive for long queries than for title queries – More smoothing is needed for long queries All three methods can perform well when optimized – Dirichlet prior is especially good for title queries – Both Dirichlet prior and JM are good for long queries – Absolute discounting has a relatively stable optimal setting

Conclusions and Future Work (cont.) Smoothing plays two different roles – Better estimation of p(w|d) – Generation of common/non-informative words in query Future work – More evaluation (types of queries, smoothing methods) – De-couple the dual role of smoothing (e.g., two-stage smoothing strategy) – Train query-specific smoothing parameters with past relevance judgments and other data (e.g., position selection translation model)

The End Thank you!

Dirichlet Prior is good for title queries JM gains most from long queries Dirichlet is the best JM gains most

Avg Pr. and

JM is most sensitive to query length

Backoff version of all three methods Adopt the same discounting of p ml (w|d) Let the collection model p(w|C) affect ONLY the unseen words, NOT the seen words

Recall Sensitivity Pattern is Similar Small DB Large DB Jelinek-MercerDirichlet PriorAbsolute Discounting

Interpolation vs. Backoff (JM on FBIS) interpolation backoff

Dirichlet is the best for title queries

Interpolation vs. Backoff (Dir. & A.D. on FBIS) Dirichlet PriorAbsolute Discounting

Precision Increase & Pattern Shift on Long Queries (  =0.3) long query (2-stage smoothing) long query (single smoothing ) title query title long single long 2-stage

A 2-stage Smoothing Strategy De-couple the two roles – Let Dirichlet play the estimation role – Let Jelinek-Mercer play the query modeling role p ml (w|d) Dirichlet prior  Doc background p(w|C) w (query word) Jelinek-Mercer Qry background p(w|Q)

Effect of 2-stage Smoothing Improve performance on long queries Reveal a consistent sensitivity pattern of precision to Dirichlet prior  Achieve better precision with more meaningful parameter settings

Better Precision & More Meaningful Parameter Setting

Title queries vs. Long queries (Jelinek-Mercer on TREC7, TREC8, and WEB) large optimal more smoothing flatter less sensitive