Term-Specific Smoothing On a paper by D. Hiemstra Alexandru A. Chitea Universität des SaarlandesMarch 10, 2005 Seminar CS 555 – Language.

Slides:

Advertisements

Similar presentations

CS188: Computational Models of Human Behavior

Advertisements

Text Categorization.

Information Retrieval and Organisation Chapter 12 Language Models for Information Retrieval Dell Zhang Birkbeck, University of London.

Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.

Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.

Chapter 5: Introduction to Information Retrieval

1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.

The Probabilistic Model. Probabilistic Model n Objective: to capture the IR problem using a probabilistic framework; n Given a user query, there is an.

Information Retrieval Models: Probabilistic Models

Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.

Hinrich Schütze and Christina Lioma Lecture 12: Language Models for IR

1 Language Model CSC4170 Web Intelligence and Social Computing Tutorial 8 Tutor: Tom Chao Zhou

Incorporating Language Modeling into the Inference Network Retrieval Framework Don Metzler.

A Markov Random Field Model for Term Dependencies Donald Metzler and W. Bruce Croft University of Massachusetts, Amherst Center for Intelligent Information.

Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 12: Language Models for IR.

Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.

1 Unsupervised Learning With Non-ignorable Missing Data Machine Learning Group Talk University of Toronto Monday Oct 4, 2004 Ben Marlin Sam Roweis Rich.

Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.

Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University.

Switch to Top-down Top-down or move-to-nearest Partition documents into ‘k’ clusters Two variants “Hard” (0/1) assignment of documents to clusters “soft”

Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)

Carnegie Mellon Exact Maximum Likelihood Estimation for Word Mixtures Yi Zhang & Jamie Callan Carnegie Mellon University Wei Xu.

1 CS 430 / INFO 430 Information Retrieval Lecture 10 Probabilistic Information Retrieval.

Learning Bayesian Networks

Scalable Text Mining with Sparse Generative Models

IR Models: Review Vector Model and Probabilistic.

Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.

Language Models for IR Debapriyo Majumdar Information Retrieval Indian Statistical Institute Kolkata Spring 2015 Credit for several slides to Jimmy Lin.

Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.

Improved search for Socially Annotated Data Authors: Nikos Sarkas, Gautam Das, Nick Koudas Presented by: Amanda Cohen Mostafavi.

A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.

Combining Statistical Language Models via the Latent Maximum Entropy Principle Shaojum Wang, Dale Schuurmans, Fuchum Peng, Yunxin Zhao.

Effective Query Formulation with Multiple Information Sources

Chapter6. Statistical Inference : n-gram Model over Sparse Data 이 동 훈 Foundations of Statistic Natural Language Processing.

Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

A Language Modeling Approach to Information Retrieval 한 경 수  Introduction  Previous Work  Model Description  Empirical Results  Conclusions.

Slides for “Data Mining” by I. H. Witten and E. Frank.

ICIP 2004, Singapore, October A Comparison of Continuous vs. Discrete Image Models for Probabilistic Image and Video Retrieval Arjen P. de Vries.

A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.

Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.

Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.

CpSc 881: Information Retrieval. 2 Using language models (LMs) for IR ❶ LM = language model ❷ We view the document as a generative model that generates.

Natural Language Processing Statistical Inference: n-grams

Relevance Feedback Hongning Wang

Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.

N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.

CS791 - Technologies of Google Spring A Webbased Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.

Introduction to Information Retrieval Probabilistic Information Retrieval Chapter 11 1.

Introduction to Information Retrieval Introduction to Information Retrieval Lecture 14: Language Models for IR.

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,

1 Probabilistic Models for Ranking Some of these slides are based on Stanford IR Course slides at

Lecture 13: Language Models for IR

CSCI 5417 Information Retrieval Systems Jim Martin

Data Mining Lecture 11.

Information Retrieval Models: Probabilistic Models

Compact Query Term Selection Using Topically Related Text

Relevance Feedback Hongning Wang

Language Models for Information Retrieval

N-Gram Model Formulas Word sequences Chain rule of probability

John Lafferty, Chengxiang Zhai School of Computer Science

Language Model Approach to IR

Junghoo “John” Cho UCLA

Topic Models in Text Processing

CS590I: Information Retrieval

Language Models for TR Rong Jin

Presentation transcript:

Term-Specific Smoothing On a paper by D. Hiemstra Alexandru A. Chitea Universität des SaarlandesMarch 10, 2005 Seminar CS 555 – Language Model based Information Retrieval

March 10, 2006 Term-Specific Smoothing2 Introduction Experimental approach to Information Retrieval –A formal model specifies an exact formula, which is tried empirically –Formulae are empirically tried because they seem plausible Modeling approach to Information Retrieval –A formal model specifies an exact formula that is used to prove some simple mathematical properties of the model

March 10, 2006 Term-Specific Smoothing3 Information Retrieval – Overview System query returns a ranked result list –Statistical ranking on term frequencies is still standard practice Search engines provide means to override the default ranking mechanisms –Users can specify mandatory query terms (e.g. +term or term in Google)

March 10, 2006 Term-Specific Smoothing4 Information Retrieval – Practice (1) Query: Star Wars Episode I (I is not treated as a mandatory term)

March 10, 2006 Term-Specific Smoothing5 Information Retrieval – Practice (2) Query: Star Wars Episode +I (I is treated as a mandatory term)

March 10, 2006 Term-Specific Smoothing6 Motivation Performance limitations in statistical ranking Statistics-based IR models do not capture term importance specification User/system should be able to override the default ranking mechanism Objective Mathematical model that supports the concept of query term importance

March 10, 2006 Term-Specific Smoothing7 Language Models A statistical model for generating text –Probability distribution over strings in a given language M Consider the Unigram Language Model (LM)

March 10, 2006 Term-Specific Smoothing8 Example – Language Models IR sample text … 0.2 search … 0.1 … mining … 0.1 food … … build model Health sample food … 0.25 nutrition … 0.1 … healthy … 0.05 diet … 0.02 … build model

March 10, 2006 Term-Specific Smoothing9 Language Models in IR Estimate a LM for each document: D Estimate probability of generating a query Q with terms (t 1,…,t n ) using a given model: Rank documents by probability of generating Q:

March 10, 2006 Term-Specific Smoothing10 Insufficient Data If a term is not in the document, the query cannot be generated: Smooth probabilities –Probabilities of observed events are decreased by a certain amount, which is credited to unobserved events

March 10, 2006 Term-Specific Smoothing11 Smoothing Roles –Estimation >> reevaluation of probabilities –Query modeling >> to explain the common and non- informative terms in a query Linear interpolation smoothing –Defines a smoothing parameter necessary for query modeling –Can be defined as a two-state Hidden Markov Model

March 10, 2006 Term-Specific Smoothing12 Smoothing Models Mixture Model smoothing –Define a hidden event for all query terms Term-specific smoothing –Define a hidden event for each query term

March 10, 2006 Term-Specific Smoothing13 Smoothing – Mixture Model Mixes the probability from the document with the general collection probability of the term can be tuned to adjust performance: –High value >> conjunctive-like search, i.e., suitable for short queries –Low value >> suitable for long queries

March 10, 2006 Term-Specific Smoothing14 Bayesian Networks (1) A Bayesian Network (BN) is a directed, acyclic graph G(V, E) where: –Nodes >> Random variables (RVs) –Edges >> Dependencies Properties:

March 10, 2006 Term-Specific Smoothing15 Bayesian Networks (2) From the properties it holds that: By the chain rule: By conditional independence:

March 10, 2006 Term-Specific Smoothing16 LM as a Bayesian Network Nodes >> random variables Edges >> models conditional dependencies Clear nodes >> hidden random variables Shaded nodes >> observed random variables Figure 1: The language modeling approach as a Bayesian network D tntn …t1t1

March 10, 2006 Term-Specific Smoothing17 Example – Mixture Model (1) Collection (2 documents) –d 1 : IBM reports a profit but revenue is down –d 2 : Siemens narrows quarter loss but revenue decreases further Model: MLE unigram from documents; Query: revenue down Ranking: d 1 > d 2

March 10, 2006 Term-Specific Smoothing18 Example – Mixture Model (2) D t 3 :downt 1 :revenue C Figure 2: Bayesian Network for C(d 1,d 2 ) language model

March 10, 2006 Term-Specific Smoothing19 Term-Specific Smoothing D t3t3 t2t2 t1t1 t3t3 t1t1 t2t2 D

March 10, 2006 Term-Specific Smoothing20 Term-Specific Smoothing – Derivation Step 1: Assume query term independence Step 2: For each t i introduce a binary RV I i (i.e. the importance of a query term)

March 10, 2006 Term-Specific Smoothing21 Term-Specific Smoothing – Derivation Step 3: Assume query term importance does not depend on D Step 4: Writing the full sum over the importance values yields:

March 10, 2006 Term-Specific Smoothing22 Term-Specific Smoothing – Derivation Step 4 (contd.): –Let, –Assume

March 10, 2006 Term-Specific Smoothing23 Term-Specific Smoothing – Properties Case 1: Stop Words (–) – >> query term is not important – >> ignore query term t i Case 2: Mandatory Terms (+) – >> relevant documents contain the query term – >> no smoothing by collection model performed Case 3: Coordination level ranking –A 1 i 0)|()1( DtP ii 1, i i 0)|( DtP ii 0 i

March 10, 2006 Term-Specific Smoothing24 Stop Words Query terms that are ignored during the search Reasons: –Frequent words (e.g. the, it, a, …) might not contribute significantly to the final document score, but they do require processing power –Words are stopped if they carry little meaning (e.g. hereupon, whereafter)

March 10, 2006 Term-Specific Smoothing25 Mandatory Terms A query term that should occur in every retrieved document Collection model can be dropped from the calculation of the document score Documents that do not match the query term are assigned null probabilities Users specify mandatory terms (e.g. by +)

March 10, 2006 Term-Specific Smoothing26 Coordination Level Ranking A A document containing n query terms will always rank higher than one with n-1 query terms Most tf.idf-ranking methods do not behave like coordination level ranking

March 10, 2006 Term-Specific Smoothing27 Term-Specific Smoothing – Review Term importance probability accounts for: –Statistics alone cannot always account for ignored query terms –Restrict the retrieved list of documents to documents that match specific terms, regardless of their frequency distributions –Enforce a coordination level ranking of the documents, regardless of the terms frequency distribution

March 10, 2006 Term-Specific Smoothing28 Relevance Feedback Predict optimal values for lambda Train on relevant documents and predict the probability of term importance for each term that maximizes retrieval performance Use the Expectation Maximization (EM) algorithm –Maximize the probability of the observed data given some training data

March 10, 2006 Term-Specific Smoothing29 EM Algorithm The algorithm iteratively maximizes the probability of the query t 1,…,t n given r relevant documents D 1,…,D r E-step M-step

March 10, 2006 Term-Specific Smoothing30 Generalization of Term Importance Allow the RV I i to have more than 2 realizations: –Combine the unigram document model with the bigram document model

March 10, 2006 Term-Specific Smoothing31 Example – General Model last will of Alfred Nobel +last will of Alfred Nobel t3t3 t1t1 t2t2 D Figure 3: Graphical model of dependence relations between query terms

March 10, 2006 Term-Specific Smoothing32 Future Research Define a unigram LM for a topic-specific space Extend beyond term-matching –Use syntax (bag of words vs. structured text) and semantics (exact terms vs. equivalent terms)

March 10, 2006 Term-Specific Smoothing33 Conclusions Extension to the LM approach to IR: model the importance of a query term –Stop Words/Phrases: trade-off between search quality and search speed –Mandatory Terms: the user overrides the default ranking algorithm Statistical ranking algorithms motivated by the LM approach perform well in an empirical setting

March 10, 2006 Term-Specific Smoothing34 Discussion Is this a valid approach? How does it differ from term weighting? Why do we want coordination level ranking? Is the bi-gram generalization valid and/or useful?

March 10, 2006 Term-Specific Smoothing35 References D. Hiemstra. Term-Specific Smoothing for the Language Modeling Approach to Information Retrieval: The Importance of a Query Term. SIGIR02, August 11-15, G. Weikum. Information Retrieval and Data Mining. Course Slides. Universität des Saarlandes (Retrieved on: February 15, 2006)