IRDM WS 2005 4-1 Chapter 4: Advanced IR Models 4.1 Probabilistic IR 4.2 Statistical Language Models (LMs) 4.2.1 Principles and Basic LMs 4.2.2 Smoothing.

Slides:



Advertisements
Similar presentations
ACM SIGIR 2009 Workshop on Redundancy, Diversity, and Interdependent Document Relevance, July 23, 2009, Boston, MA 1 Modeling Diversity in Information.
Advertisements

Information Retrieval and Organisation Chapter 12 Language Models for Information Retrieval Dell Zhang Birkbeck, University of London.
CS276A Text Retrieval and Mining Lecture 12 [Borrows slides from Viktor Lavrenko and Chengxiang Zhai]
Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.
Information retrieval – LSI, pLSI and LDA
Statistical Translation Language Model Maryam Karimzadehgan University of Illinois at Urbana-Champaign 1.
1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,
Language Models Hongning Wang
An Introduction to LDA Tools Kuan-Yu Chen Institute of Information Science, Academia Sinica.
Information Retrieval in Practice
Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.
Chapter 7 Retrieval Models.
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Carnegie Mellon Exact Maximum Likelihood Estimation for Word Mixtures Yi Zhang & Jamie Callan Carnegie Mellon University Wei Xu.
Scalable Text Mining with Sparse Generative Models
Language Modeling Approaches for Information Retrieval Rong Jin.
III.4 Statistical Language Models
Chapter 7 Retrieval Models.
Generating Impact-Based Summaries for Scientific Literature Qiaozhu Mei, ChengXiang Zhai University of Illinois at Urbana-Champaign 1.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
Learning to Rank for Information Retrieval
Language Models for IR Debapriyo Majumdar Information Retrieval Indian Statistical Institute Kolkata Spring 2015 Credit for several slides to Jimmy Lin.
IRDM WS Chapter 4: Advanced IR Models 4.1 Probabilistic IR Principles Probabilistic IR with Term Independence Probabilistic.
1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Language Models Hongning Wang Two-stage smoothing [Zhai & Lafferty 02] c(w,d) |d| P(w|d) = +  p(w|C) ++ Stage-1 -Explain unseen words -Dirichlet.
Chapter 6: Statistical Inference: n-gram Models over Sparse Data
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
A General Optimization Framework for Smoothing Language Models on Graph Structures Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Language Models Hongning Wang Recap: document generation model 4501: Information Retrieval Model of relevant docs for Q Model of non-relevant.
Chapter 23: Probabilistic Language Models April 13, 2004.
Statistical Language Models for Biomedical Literature Retrieval ChengXiang Zhai Department of Computer Science, Institute for Genomic Biology, And Graduate.
ICIP 2004, Singapore, October A Comparison of Continuous vs. Discrete Image Models for Probabilistic Image and Video Retrieval Arjen P. de Vries.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
Latent Dirichlet Allocation
Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
Natural Language Processing Statistical Inference: n-grams
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval Min Zhang, Xinyao Ye Tsinghua University SIGIR
A Study of Poisson Query Generation Model for Information Retrieval
Statistical Language Models Hongning Wang CS 6501: Text Mining1.
2010 © University of Michigan Probabilistic Models in Information Retrieval SI650: Information Retrieval Winter 2010 School of Information University of.
Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23.
A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie.
Language Modeling Again So are we smooth now? Courtesy of Chris Jordan.
Language-model-based similarity on large texts Tolga Çekiç /10.
1 Integrating Term Relationships into Language Models for Information Retrieval Jian-Yun Nie RALI, Dept. IRO University of Montreal, Canada.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
Overview of Statistical Language Models
Information Retrieval Models: Language Models
CS276A Text Information Retrieval, Mining, and Exploitation
Statistical Language Models
Hidden Markov Models (HMMs)
Relevance Feedback Hongning Wang
Language Models for Information Retrieval
Hidden Markov Models (HMMs)
Lecture 12 The Language Model Approach to IR
John Lafferty, Chengxiang Zhai School of Computer Science
Language Model Approach to IR
Junghoo “John” Cho UCLA
Topic Models in Text Processing
Language Models Hongning Wang
CS590I: Information Retrieval
Language Models for Information Retrieval
Language Models for TR Rong Jin
Presentation transcript:

IRDM WS Chapter 4: Advanced IR Models 4.1 Probabilistic IR 4.2 Statistical Language Models (LMs) Principles and Basic LMs Smoothing Methods Extended LMs 4.3 Latent-Concept Models

IRDM WS What is a Statistical Language Model? generative model for word sequence (generates probability distribution of word sequences, or bag-of-words, or set-of-words, or structured doc, or...) Example: P[„Today is Tuesday“] = P[„Today Wednesday is“] = P[„The Eigenvalue is positive“] = LM itself highly context- / application-dependent Examples: speech recognition: given that we heard „Julia“ and „feels“, how likely will we next hear „happy“ or „habit“? text classification: given that we saw „soccer“ 3 times and „game“ 2 times, how likely is the news about sports? information retrieval: given that the user is interested in math, how likely would the user use „distribution“ in a query?

IRDM WS Source-Channel Framework [Shannon 1948] Source Transmitter (Encoder) Noisy Channel Receiver (Decoder) Destination P[X]P[Y|X]P[X|Y]=? XYX‘ X is text  P[X] is language model Applications: speech recognitionX: word sequenceY: speech signal machine translationX: English sentenceY: German sentence OCR error correctionX: correct wordY: erroneous word summarizationX: summaryY: document information retrievalX: documentY: query

IRDM WS Text Generation with (Unigram) LM... text0.2 mining0.1 n-gram 0.01 cluster food LM for topic 1: IR&DM... food0.25 nutrition0.1 healthy 0.05 diet LM for topic 2: Health LM  : P[word |  ] text mining paper food nutrition paper document d sample different  d for different d

IRDM WS Basic LM for IR... text? mining? n-gram ? cluster?... food ?... food? nutrition? healthy ? diet?... text mining paper food nutrition paper parameter estimation query q: data mining algorithms ? ? Which LM is more likely to generate q? (better explains q)

IRDM WS IR as LM Estimation P[R|d,q] user likes doc (R) given that it has features d and user poses query q prob. IR statist. LM query likelihood: top-k query result: MLE would be tf

IRDM WS Multi-Bernoulli vs. Multinomial LM Multi-Bernoulli: with Xj(q)=1 if j  q, 0 otherwise Multinomial: with f j (q) = f(j) = relative frequency of j in q multinomial LM more expressive and usually preferred

IRDM WS LM Scoring by Kullback-Leibler Divergence neg. cross-entropy neg. KL divergence of  q and  q

IRDM WS Smoothing Methods possible methods: Laplace smoothing Absolute Discouting Jelinek-Mercer smoothing Dirichlet-prior smoothing... most with their own parameters absolutely crucial to avoid overfitting and make LMs useful (one LM per doc, one LM per query !) choice and parameter setting still pretty much black art (or empirical)

IRDM WS Laplace Smoothing and Absolute Discounting estimation of  d : p j (d) by MLE would yield Additive Laplace smoothing: Absolute discounting: where with corpus C,  [0,1] where

IRDM WS Jelinek-Mercer Smoothing Idea: use linear combination of doc LM with background LM (corpus, common language); could also consider query log as background LM for query

IRDM WS Dirichlet-Prior Smoothing with where  1 =sP[1|C],...,  m =sP[m|C] are the parameters of the underlying Dirichlet distribution, with constant s > 1 typically set to multiple of document length with MLEs P[j|d], P[j|C] derived by MAP with Dirichlet distribution as prior for parameters of multinomial distribution if with multinomial Beta: (Dirichlet is conjugate prior for parameters of multinomial distribution: Dirichlet prior implies Dirichlet posterior, only with different parameters) tf  j from corpus

IRDM WS Extended LMs large variety of extensions: Term-specific smoothing (JM with term-specific j, e.g. based on idf values) Parsimonious LM (JM-style smoothing with smaller feature space) N-gram (Sequence) Models (e.g. HMMs) (Semantic) Translation Models Cross-Lingual Models Query-Log- & Click-Stream-based LM

IRDM WS (Semantic) Translation Model with word-word translation model P[j|w] Opportunities and difficulties: synonymy, hypernymy/hyponymy, polysemy efficiency training estimate P[j|w] by overlap statistics on background corpus (Dice coefficients, Jaccard coefficients, etc.)

IRDM WS Query-Log-Based LM (User LM) Idea: for current query q k leverage prior query history H q = q 1... q k-1 and prior click stream H c = d 1... d k-1 as background LMs Example: q k = „Java library“ benefits from q k-1 = „cgi programming“ Simple Mixture Model with Fixed Coefficient Interpolation: More advanced models with Dirichlet priors in the literature

IRDM WS Additional Literature for Chapter 4 Statistical Language Models: Grossman/Frieder Section 2.3 W.B. Croft, J. Lafferty (Editors): Language Modeling for Information Retrieval, Kluwer, 2003 C. Zhai: Statistical Language Models for Information Retrieval, Tutorial Slides, SIGIR 2005 X. Liu, W.B. Croft: Statistical Language Modeling for Information Retrieval, Annual Review of Information Science and Technology 39, 2004 J. Ponte, W.B. Croft: A Language Modeling Approach to Information Retrieval, SIGIR 1998 C. Zhai, J. Lafferty: A Study of Smoothing Methods for Language Models Applied to Information Retrieval, TOIS 22(2), 2004 C. Zhai, J. Lafferty: A Risk Minimization Framework for Information Retrieval, Information Processing and Management 42, 2006 X. Shen, B. Tan, C. Zhai: Context-Sensitive Information Retrieval Using Implicit Feedback, SIGIR 2005 M.E. Maron, J.L. Kuhns: On Relevance, Probabilistic Indexing, and Information Retrieval, Journal of the ACM 7, 1960