Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA 01003 SIGIR 2001.

Slides:

Advertisements

Similar presentations

Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Advertisements

Metadata in Carrot II Current metadata –TF.IDF for both documents and collections –Full-text index –Metadata are transferred between different nodes Potential.

Term Level Search Result Diversification DATE : 2013/09/11 SOURCE : SIGIR’13 AUTHORS : VAN DANG, W. BRUCE CROFT ADVISOR : DR.JIA-LING, KOH SPEAKER : SHUN-CHEN,

Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.

SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.

Probabilistic Ranking Principle

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

Information Retrieval Models: Probabilistic Models

Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.

IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.

1 Tensor Query Expansion: a cognitively motivated relevance model Mike Symonds, Peter Bruza, Laurianne Sitbon and Ian Turner Queensland University of Technology.

Automatic Image Annotation and Retrieval using Cross-Media Relevance Models J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University.

Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 12: Language Models for IR.

1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.

Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University.

Formal Multinomial and Multiple- Bernoulli Language Models Don Metzler.

Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.

The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.

A Search Engine for Historical Manuscript Images Toni M. Rath, R. Manmatha and Victor Lavrenko Center for Intelligent Information Retrieval University.

Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.

Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.

A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA

Finding Similar Questions in Large Question and Answer Archives Jiwoon Jeon, W. Bruce Croft and Joon Ho Lee Retrieval Models for Question and Answer Archives.

1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

Language Models Hongning Wang Two-stage smoothing [Zhai & Lafferty 02] c(w,d) |d| P(w|d) = +  p(w|C) ++ Stage-1 -Explain unseen words -Dirichlet.

A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.

A Comparison of Statistical Significance Tests for Information Retrieval Evaluation CIKM´07, November 2007.

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.

Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.

Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,

A General Optimization Framework for Smoothing Language Models on Graph Structures Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.

UMass at TDT 2000 James Allan and Victor Lavrenko (with David Frey and Vikas Khandelwal) Center for Intelligent Information Retrieval Department of Computer.

Predicting Question Quality Bruce Croft and Stephen Cronen-Townsend University of Massachusetts Amherst.

Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.

Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval Ben Cartrette and Praveen Chandar Dept. of Computer and Information Science.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,

Chapter 23: Probabilistic Language Models April 13, 2004.

Positional Relevance Model for Pseudo–Relevance Feedback Yuanhua Lv & ChengXiang Zhai Department of Computer Science, UIUC Presented by Bo Man 2014/11/18.

A Language Modeling Approach to Information Retrieval 한 경 수  Introduction  Previous Work  Model Description  Empirical Results  Conclusions.

Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,

A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.

1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.

A Probabilistic Model for Fine-Grained Expert Search Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu June , 2008, Columbus Ohio.

NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.

1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.

Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.

Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.

Minimum Rank Error Training for Language Modeling Meng-Sung Wu Department of Computer Science and Information Engineering National Cheng Kung University,

A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval Min Zhang, Xinyao Ye Tsinghua University SIGIR

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

A Study of Poisson Query Generation Model for Information Retrieval

Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.

A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie.

Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)

University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G

Introduction to Information Retrieval Introduction to Information Retrieval Lecture 14: Language Models for IR.

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,

Information Retrieval Models: Probabilistic Models

Language Models for Information Retrieval

A Non-Parametric Bayesian Method for Inferring Hidden Causes

Murat Açar - Zeynep Çipiloğlu Yıldız

John Lafferty, Chengxiang Zhai School of Computer Science

Language Model Approach to IR

Language Models for TR Rong Jin

Presentation transcript:

Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001 Presented by Yi-Ting

Outline Introduction Related Work － Classical Probabilistic Model 、 Language Modeling Approaches Relevance Model Experimental Results Conclusions

introduction A departure from the traditional models of relevance and emergence of language modeling frameworks. To introduce a technique for constructing a relevance model from the query alone and compare resulting query-based relevance models to language models constructed with training data.

Related Work Classical Probabilistic Models  Probability ranking principle ： P(D|R)/P(D|N)  Generative language models ： P(w|R)/P(w|N)  Binary Independence Model  multiple-Bernoulli language models ：  2-Poisson Model

Related Work Language Modeling Approaches  Focused on viewing documents themselves as models and queries as strings of text randomly sampled from these models.  By Ponte and Croft,  By Miller et al., Song and Croft,  Berger and Lafferty view the query Q as a potential translation of the document D, and use powerful estimation techniques.

Relevance Model To refer to a mechanism that determines the probability P(w|R) of observing a word w in the documents relevant. Justified way of estimating the relevance model when no training data is available. Ranking with a Relevance Model ：

Relevance Model Assume that the query is a sample from any specific document model. Instead we assume that both the query and the documents are samples from an unknown relevance model R.

Relevance Model Let Q=q…q k. The probability of w to the conditional probability of observing w given that we just observed q 1 …q k :

Relevance Model Method1 ： i.i.d sampling  Assume that the query q…q k words and the w in relevant documents are sampled identically and independently from a unigram distribution M R.

Relevance Model Method2 ： conditional sampling  Fixing a value of w according to some prior P(w). Then pick a distribution M i according to P(M i |w), sample the query word q i from M i with probability P(q i |M i ).  Assume the query words q…q k to be independent of each other, but keep their dependence on w ：

Relevance Model Method 2 performs slightly better in terms of retrieval effectiveness and tracking errors.

Relevance Model Estimation Details  Set the query prior P( q…q k ) in equation(6) to be ：  P(w) to be:  P(w|M D ) to be:  P(M i |w) to be: P(M i |w) = P(w|M i )P(w)/P(M i )

Experimental Results Model cross-entropy  Cross-entropy is an information-theoretic measure of distance between two distribution, measured bits.  Both models exhibit lowest cross-entropy around 50 documents, however the model estimated with Method 2 achieves lower absolute cross-entropy.

Experimental Results

TREC ad-hoc retrieval  Two query sets: TREC title queries and  Baseline: the performance of the language modeling approach, where we rank documents by their probability of generating the query.  Average precision 、 Wilcoxon test 、 R_Precision  Relevance models is an attractive choice in high- precision applications.

Experimental Results

TDT topic tracking  The tracking task in TDT is evaluated using Detection Error Tradeoff (DET) : the tradeoff between Misses and False errors.  Ran the tracking system on a modified task.  TDT tracking systems with 1 to 4 training examples.

Experimental Results

Conclusions Our technique is also relatively insensitive to the number of documents used in “expansion”. our model is simple to implement and does not require any training data. There are a number of interesting directions for further investigation of relevance models. Our method produces accurate models of relevance, which leads to significantly improved retrieval performance. demonstrated that unsupervised Relevance Models can be competitive with supervised topic models in TDT.