Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.

Slides:



Advertisements
Similar presentations
Information Retrieval and Organisation Chapter 12 Language Models for Information Retrieval Dell Zhang Birkbeck, University of London.
Advertisements

Information Retrieval and Organisation Chapter 11 Probabilistic Information Retrieval Dell Zhang Birkbeck, University of London.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.
1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,
Language Models Hongning Wang
Albert Gatt Corpora and Statistical Methods – Lecture 7.
Information Retrieval in Practice
Hinrich Schütze and Christina Lioma Lecture 12: Language Models for IR
Chapter 7 Retrieval Models.
1 Language Model CSC4170 Web Intelligence and Social Computing Tutorial 8 Tutor: Tom Chao Zhou
Database Management Systems, R. Ramakrishnan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 12: Language Models for IR.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Latent Dirichlet Allocation a generative model for text
Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University.
Formal Multinomial and Multiple- Bernoulli Language Models Don Metzler.
Switch to Top-down Top-down or move-to-nearest Partition documents into ‘k’ clusters Two variants “Hard” (0/1) assignment of documents to clusters “soft”
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
1 CS 430 / INFO 430 Information Retrieval Lecture 10 Probabilistic Information Retrieval.
Retrieval Models II Vector Space, Probabilistic.  Allan, Ballesteros, Croft, and/or Turtle Properties of Inner Product The inner product is unbounded.
Language Modeling Approaches for Information Retrieval Rong Jin.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Statistical NLP: Lecture 8 Statistical Inference: n-gram Models over Sparse Data (Ch 6)
1 Computing Relevance, Similarity: The Vector Space Model.
CPSC 404 Laks V.S. Lakshmanan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides at UC-Berkeley.
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Lecture 4 Ngrams Smoothing
Chapter 23: Probabilistic Language Models April 13, 2004.
Bag-of-Words Methods for Text Mining CSCI-GA.2590 – Lecture 2A Ralph Grishman NYU.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
Language Model in Turkish IR Melih Kandemir F. Melih Özbekoğlu Can Şardan Ömer S. Uğurlu.
Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
CpSc 881: Information Retrieval. 2 Using language models (LMs) for IR ❶ LM = language model ❷ We view the document as a generative model that generates.
Natural Language Processing Statistical Inference: n-grams
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
Relevance Feedback Hongning Wang
Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.
Chapter 14 Week 5, Monday. Introductory Example Consider a fair coin: Question: If I flip this coin, what is the probability of observing heads? Answer:
A Study of Poisson Query Generation Model for Information Retrieval
A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie.
Language Modeling Again So are we smooth now? Courtesy of Chris Jordan.
Language Modeling Part II: Smoothing Techniques Niranjan Balasubramanian Slide Credits: Chris Manning, Dan Jurafsky, Mausam.
Introduction to Information Retrieval Probabilistic Information Retrieval Chapter 11 1.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 14: Language Models for IR.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
1 Probabilistic Models for Ranking Some of these slides are based on Stanford IR Course slides at
True/False questions (3pts*2)
Lecture 13: Language Models for IR
Statistical Language Models
CSCI 5417 Information Retrieval Systems Jim Martin
Relevance Feedback Hongning Wang
Language Models for Information Retrieval
Introduction to Statistical Modeling
Presented by Wen-Hung Tsai Speech Lab, CSIE, NTNU 2005/07/13
Language Model Approach to IR
Topic Models in Text Processing
Language Models Hongning Wang
CS590I: Information Retrieval
INF 141: Information Retrieval
Conceptual grounding Nisheeth 26th March 2019.
Language Models for TR Rong Jin
Presentation transcript:

Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan

What models we covered in class so far Boolean Extended Boolean Vector Space –TF*IDF Probabilistic Modeling –log P(D|R) / P(D|N)

Probability Ranking Principle “If a reference retrieval system's response to each request is a ranking of the documents in the collection in order of decreasing probability of relevance to the user who submitted the request, where the probabilities are estimated as accurately as possible on the basis of whatever data have been made available to the system for this purpose, the overall effectiveness of the system to its user will be the best that is obtainable on the basis of those data.” - Robertson

Bag of words? What bag? Documents are a vector of term occurrences Assumption of exchangeability What is this really? –A hyperspace where each dimension is represented by a term –Values are term occurrences

Can we model this bag? Binomial Distribution –Bernoulli / Success Fail Trials –e.g. Flipping a coin: chance of getting a head Multinomial –Probability of events occurring –e.g. Flipping a coin: chance of head, chance of tail –e.g. Die Roll: chance of 1, 2, …, 6 –e.g. Document: chance of a term occurring

Review What is the Probability Ranking Principle? What is the bag of words model? What is exchangeability? What is a binomial? What is a multinomial?

Some Terminology Term: t Vocabulary: V = {t 1 t 2 … t n } Document: d x = t dx1 … t dxm  V Corpus: C = {d 1 d 2 … d k } Query: Q = q 1 q 2 … q i  V

Language Modeling A document is represented by multinomial Unigram model –A piece of text is generated by each term independently p(t 1 t 2 … t n ) = p(t 1 )p(t 2 )…p(t n ) p(t 1 )+p(t 2 )+…+p(t n )=1

Why Unigram Easy to implement –Reasonable performance Word order and structure not captured –How much benefit would they add? Open question More parameters to tune in complex models –Need more data to train –Need more time to compute –Need more space to store

Enough… how do I retrieve documents? p(Q|d) = p(q 1 |d)p(q 2 |d)…p(q n |d) How do we estimate p(q|d)? –Maximum Likelihood Estimate –MLE(q|d) = freq(q|d) / ∑freq(i|d) Probability Ranking Principle

Review What is the unigram model? Is the language model a binomial or multinomial? Why use the unigram model? Given a query, how do we use a language model to retrieve documents?

What is wrong with MLE Creates 0 probabilities for terms that do not occur 0 probabilities break similarity scoring function Is a 0 probability sensible? –Can a word never ever occur?

How can we fix this? How do we get around the zero probabilities? –New similarity function? –Remove zero probabilities? Build a different model?

Smoothing Approaches Laplace / Addictive Mixture Models –Interpolation Jelinek Mercer Dirichlet Absolute Discounting –Backoff

Laplace Just up all term frequencies by 1 Where have you seen this before? Is this a good idea? –Strengths –Weaknesses

Interpolation Mixture model approach –Combine probability models Traditionally combine document model with the corpus model Is this a good idea? –What else is the corpus model used for? –Strengths –Weaknesses

Backoff Only add probability mass to terms that are not seen What does this do to the probability model? –Flatter? Is this a good idea?

Are their other sources for probability mass? Document Clusters Document Classes User Profiles Topic models

Review What is wrong with 0 probabilities? How does smoothing fix it? What is smoothing really doing? What is Interpolation? –What is that mixture model really representing? What can we use to mix with the document model?

Bored yet? Let’s do something complicated Entropy - Information Theory –H(x) = -∑p(x) log p(x) –Good for data compression Relative Entropy –D(p||q) = ∑p(x) log (p(x)/q(x)) –Not a true distance measure –Used to find differences between probability models

Ok… that’s nice What does relative entropy give us? –Why not just subtract probabilities? –On your calculators calculate p(x) log (p(x)/q(x)) for p(x) =.8, q(x) =.6 p(x) =.6, q(x) =.4

Clarity Score Calculate the relative entropy between the result set and the corpus –Positive correlation between high clarity score / relative entropy and query performance –So what is that actually saying?

Relative Entropy Query Expansion Relevance Feedback Blind Relevance Feedback Expand query with terms that contribute the most to relative entropy What are we doing to the query when we do this?

Controlled Query Generation Some of my research p(x) log (p(x)/q(x)) is a good term discrimination function Regulate the construction of queries for evaluating retrieval algorithms –First real controlled reaction experiments with retrieval algorithms

Review Who is the father of Information Theory? What is Entropy? What is Relative Entropy? What is the Clarity Score? What are the terms that contribute the most to relative entropy? –Are they useful?

You have been a good class Introduced to the language model for information retrieval Documents represented as multinomial distributions –Generative model –Queries are generated Smoothing Applications in IR

Questions for me?

Questions for you What is the Maximum Likelihood Estimate? Why is smoothing important? What is interpolation? What is entropy? What is relative entropy? Does language modeling make sense?