Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.

Slides:

Advertisements

Similar presentations

Information Retrieval and Organisation Chapter 12 Language Models for Information Retrieval Dell Zhang Birkbeck, University of London.

Advertisements

1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,

Language Models Hongning Wang

SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.

Probabilistic Ranking Principle

Information Retrieval in Practice

Information Retrieval Models: Probabilistic Models

Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.

Hinrich Schütze and Christina Lioma Lecture 12: Language Models for IR

Chapter 7 Retrieval Models.

1 Language Model CSC4170 Web Intelligence and Social Computing Tutorial 8 Tutor: Tom Chao Zhou

IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.

Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 12: Language Models for IR.

Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.

1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.

1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.

1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.

Latent Dirichlet Allocation a generative model for text

Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University.

Vector Space Model CS 652 Information Extraction and Integration.

Retrieval Models II Vector Space, Probabilistic.  Allan, Ballesteros, Croft, and/or Turtle Properties of Inner Product The inner product is unbounded.

Language Modeling Approaches for Information Retrieval Rong Jin.

Chapter 7 Retrieval Models.

1 Advanced Smoothing, Evaluation of Language Models.

Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.

Language Models for IR Debapriyo Majumdar Information Retrieval Indian Statistical Institute Kolkata Spring 2015 Credit for several slides to Jimmy Lin.

Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.

Improved search for Socially Annotated Data Authors: Nikos Sarkas, Gautam Das, Nick Koudas Presented by: Amanda Cohen Mostafavi.

A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.

Chapter 6: Statistical Inference: n-gram Models over Sparse Data

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

Introduction to Digital Libraries hussein suleman uct cs honours 2003.

Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.

Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.

Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

Chapter 23: Probabilistic Language Models April 13, 2004.

A Language Modeling Approach to Information Retrieval 한 경 수  Introduction  Previous Work  Model Description  Empirical Results  Conclusions.

Language Models. Language models Based on the notion of probabilities and processes for generating text Documents are ranked based on the probability.

Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,

Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.

CpSc 881: Information Retrieval. 2 Using language models (LMs) for IR ❶ LM = language model ❷ We view the document as a generative model that generates.

Natural Language Processing Statistical Inference: n-grams

Lecture 3: MLE, Bayes Learning, and Maximum Entropy

Relevance Feedback Hongning Wang

Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.

Statistical Language Models Hongning Wang CS 6501: Text Mining1.

Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.

Language Modeling Part II: Smoothing Techniques Niranjan Balasubramanian Slide Credits: Chris Manning, Dan Jurafsky, Mausam.

Introduction to Information Retrieval Probabilistic Information Retrieval Chapter 11 1.

Introduction to Information Retrieval Introduction to Information Retrieval Lecture 14: Language Models for IR.

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,

1 Probabilistic Models for Ranking Some of these slides are based on Stanford IR Course slides at

CS276A Text Information Retrieval, Mining, and Exploitation

Lecture 13: Language Models for IR

CSCI 5417 Information Retrieval Systems Jim Martin

Lecture 15: Text Classification & Naive Bayes

Relevance Feedback Hongning Wang

Language Models for Information Retrieval

Lecture 12 The Language Model Approach to IR

John Lafferty, Chengxiang Zhai School of Computer Science

Language Model Approach to IR

CS 4501: Information Retrieval

Topic Models in Text Processing

CS590I: Information Retrieval

INF 141: Information Retrieval

Conceptual grounding Nisheeth 26th March 2019.

Information Retrieval and Web Design

Language Models for TR Rong Jin

Presentation transcript:

Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze

IR approaches Boolean retrieval – Boolean constrains of term occurrences in documents – no ranking Vector space model – Queries and vectors are represented as vectors in a high dimensional space – Notions of similarity (cosine similarity) implying ranking Probabilistic model – Rank documents by the probability P(R|d,q) – Estimate P(R|d,q) using relevance feedback technique Language Models – today’s class

Intuition Users who try to think of a good query, think of words that are likely to appear in relevant documents Language model approach: A document is a good match to a query, if the document model is likely to generate the query – If document contains query words often

Illustration Language Model document query

Traditional language model Finite automata Generative model I wish I wish I wish I wish I wish …… The language of the automaton: the full set of strings that it can generate

Probabilistic language model Each node has a probability distribution over generating different terms A language model is a function that puts a probability measure over strings drawn from some vocabulary The model is called Finite State Transducer

Language model example s the0.2 a0.1 frog0.01 toad0.01 said0.03 likes0.02 that0.04 ….. STOP0.2 state emission probabilities (partial) unigram language model P(frog said that toad likes frog) = 0.01 x 0.03 x 0.04 x 0.01 x 0.02 x 0.01 (We ignore continue/stop probabilities assuming they are fixed for all queries) Probability that some text (e.g. a query) was generated by the model:

Query likelihood sfrogsaidthattoadlikesthatdog M M q = frog likes toad P(q | M1) = 0.01 x 0.02 x 0.01 P(q | M2) = x 0.04 x P(q|M1) > P(q|M2) => M1 is more likely to generate query q

Types of language models How do we build probabilities over sequence of terms? P(t1 t2 t3 t4) = P(t1) x P(t2|t1) x P(t3|t1 t2) x P(t4|t1 t2 t3) Unigram language model – most simplest ; no conditioning context P(t1 t2 t3 t4) = P(t1) x P(t2) x P(t3) x P(t4) Bigram language model – condition on previous term P(t1 t2 t3 t4) = P(t1) x P(t2|t1) x P(t3|t2) x P(t4|t3) Trigram language model … Unigram model is the most common in IR Often sufficient to judge the topic of a document Data sparseness issues when using richer models Simple and efficient implementation

The query likelihood model Goal: rank documents by P(d|q) – The probability that a user querying q, had the document d in mind Bayes Rule: P(d|q) = P(q|d)P(d)/P(q) P(q) – same for all documents  ignored P(d) – often treated as uniform across documents  ignored – Could be non uniform prior based on criteria like authority, length, genre, newness …  Rank by P(q|d)

The query likelihood model (2) P(q|d) - the probability that a query q was generated by a language model derived from document d – The probability that a query would be observed as a random sample from the respective document model Algorithm: 1.Infer a Language Model Md for each document d 2.Estimate P(q|Md) 3.Rank the documents according to these probabilities

Illustration d1 Md1 query d2 Md2 d3 Md3 P(q|Md1) P(q|Md2) P(q|Md3) E.g., P(q|Md3) > P(q|Md1) > P(q|Md2)  d3 is first, d1 is second, d2 is third

Estimating P(q|Md) Use Maximum Likelihood Estimation - MLE Assume a unigram language model (terms occur independently) unigramMLE Length of document Term frequency

Sparse data problem Documents are sparse – Some words don’t appear in the document – In particular, some of the query terms  P(q|d) = 0 ; zero probability problem – Conjunctive semantics Occurring words are poorly estimated – A single documents is small training set – Occurring words are over estimated Their occurrence was partly by chance

Solution: smoothing Smooth probabilities in Language Models – overcome zero probabilities – give some probability mass to unseen words The probability of a non occurring term should be close to its probability to occur in the collection P(t|Mc) = cf(t)/T cf(t) = #occurrences of term t in the collection T – length of the collection = sum of all document lengths

Smoothing methods Linear Interpolation Bayesian smoothing Summary, with linear interpolation In practice, log in taken from both sides of the equation to avoid multiplying many small numbers

Exercise Given a collection of two documents D1, D2 D1: Xyzzy reports a profit but revenue is down D2: Quorus narrows quarter loss but revenue decreases further A user submitted the query: “revenue down” Rank D1 and D2 - Use an MLE unigram model and a linear interpolation smoothing with lambda parameter 0.5

Extended LM approaches query model document Document model query likelihood document likelihood model comparison Query likelihood P(q|d) – the probability of document LM to generate query we’ve seen in previous slides … Document likelihood P(d|q) – the probability of query LM to generate document in the next slides … Model comparison R(d;q) – compare between document and query models in the next slides … P(t|query) P(t|document)

Document likelihood model P(d|q) – the probability of query LM to generate document Problem: queries are short  bad model estimation [Zhai and Lafferty 2001] – Expand the query with terms taken from relevant documents in the usual way and hence update the language mode

KL divergence Kullback–Leibler (KL) divergence An asymmetric divergence measure from information theory Measures the difference between two probability distributions P, Q Typically Q is an estimation of P Properties Non negative Equals 0 iff P equals Q May have an infinite value Asymmetric, thus not a metric Jensen–Shannon (JS) divergence Based on KL divergence (D) Always finite 0 <= JSD <= 1 Symmetric

Model comparison Make LM from both query and document Measure `how different` these LMs from each other  Use KL divergence Rank by KLD - the closer to 0 the higher is the rank

Language models - summary Probabilistic model – mathematically precise Intuitive, simple concept Achieves very good retrieval results – Still, no evidence that it exceeds the traditional vector space model Relation to the Vector Space Model – Both use term frequency – Smoothing with collection generation probability is a little like idf Terms rare in the general collection but common in some documents will have a greater influence on the document’s ranking – Probabilistic vs. geometric – Mathematical mode vs. heuristic model