Language Modeling Frameworks for Information Retrieval John Lafferty School of Computer Science Carnegie Mellon University.

Slides:



Advertisements
Similar presentations
ACM SIGIR 2009 Workshop on Redundancy, Diversity, and Interdependent Document Relevance, July 23, 2009, Boston, MA 1 Modeling Diversity in Information.
Advertisements

Less is More Probabilistic Model for Retrieving Fewer Relevant Docuemtns Harr Chen and David R. Karger MIT CSAIL SIGIR2006 4/30/2007.
1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Representing and Querying Correlated Tuples in Probabilistic Databases
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Developing and Evaluating a Query Recommendation Feature to Assist Users with Online Information Seeking & Retrieval With graduate students: Karl Gyllstrom,
Cumulative Progress in Language Models for Information Retrieval Antti Puurula 6/12/2013 Australasian Language Technology Workshop University of Waikato.
The Probabilistic Model. Probabilistic Model n Objective: to capture the IR problem using a probabilistic framework; n Given a user query, there is an.
Statistical Topic Modeling part 1
Empirical Development of an Exponential Probabilistic Model Using Textual Analysis to Build a Better Model Jaime Teevan & David R. Karger CSAIL (LCS+AI),
Web Search - Summer Term 2006 II. Information Retrieval (Basics Cont.)
A Probabilistic Framework for Information Integration and Retrieval on the Semantic Web by Livia Predoiu, Heiner Stuckenschmidt Institute of Computer Science,
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
Models for Information Retrieval Mainly used in science and research, (probably?) less often in real systems But: Research results have significance for.
Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
INFO 624 Week 3 Retrieval System Evaluation
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University.
Exploration & Exploitation in Adaptive Filtering Based on Bayesian Active Learning Yi Zhang, Jamie Callan Carnegie Mellon Univ. Wei Xu NEC Lab America.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Language Modeling Approaches for Information Retrieval Rong Jin.
ICAIL 2007 DESI Workshop Panel presentation Marie-Francine Moens Centre for Law and ICT/ Department of Computer Science Katholieke Universiteit Leuven,
The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.
1 A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search 1 Jie Tang, 2 Ruoming Jin, and 1 Jing Zhang 1 Knowledge.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Example 16,000 documents 100 topic Picked those with large p(w|z)
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
Language Models Hongning Wang Two-stage smoothing [Zhai & Lafferty 02] c(w,d) |d| P(w|d) = +  p(w|C) ++ Stage-1 -Explain unseen words -Dirichlet.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
1 Rated Aspect Summarization of Short Comments Yue Lu, ChengXiang Zhai, and Neel Sundaresan Presented by: Sapan Shah.
Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006.
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
ACM SIGIR 2009 Workshop on Redundancy, Diversity, and Interdependent Document Relevance, July 23, 2009, Boston, MA 1 Modeling Diversity in Information.
Chapter 23: Probabilistic Language Models April 13, 2004.
Positional Relevance Model for Pseudo–Relevance Feedback Yuanhua Lv & ChengXiang Zhai Department of Computer Science, UIUC Presented by Bo Man 2014/11/18.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
Latent Dirichlet Allocation
Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.
Jen-Tzung Chien, Meng-Sung Wu Minimum Rank Error Language Modeling.
Supporting Knowledge Discovery: Next Generation of Search Engines Qiaozhu Mei 04/21/2005.
Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Using Social Annotations to Improve Language Model for Information Retrieval Shengliang Xu, Shenghua Bao, Yong Yu Shanghai Jiao Tong University Yunbo Cao.
Markov Random Fields & Conditional Random Fields
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Relevance Feedback Hongning Wang
Evaluation. The major goal of IR is to search document relevant to a user query. The evaluation of the performance of IR systems relies on the notion.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
1 Risk Minimization and Language Modeling in Text Retrieval ChengXiang Zhai Thesis Committee: John Lafferty (Chair), Jamie Callan Jaime Carbonell David.
A Study of Poisson Query Generation Model for Information Retrieval
A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie.
Collaborative Filtering With Decoupled Models for Preferences and Ratings Rong Jin 1, Luo Si 1, ChengXiang Zhai 2 and Jamie Callan 1 Language Technology.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
Evaluation.
Relevance Feedback Hongning Wang
IR Theory: Evaluation Methods
Modeling Diversity in Information Retrieval
Bayesian Inference for Mixture Language Models
John Lafferty, Chengxiang Zhai School of Computer Science
Topic Models in Text Processing
Retrieval Evaluation - Measures
Retrieval Performance Evaluation - Measures
Language Models for TR Rong Jin
Presentation transcript:

Language Modeling Frameworks for Information Retrieval John Lafferty School of Computer Science Carnegie Mellon University

September 11, 2002Language Modeling and Information Retrieval Workshop1 Retrieval As Decision Making Excerpt ? Clustering ? Given a query, - Which documents should be selected? (D) - How should these docs be presented to the user? (  ) Query … Ranked list ? 1234

September 11, 2002Language Modeling and Information Retrieval Workshop2 Decision Theory Framework observed Partially observed inferred S Source d Document U User q Query R Unified framework can be built on Bayesian decision theory: Models, loss function, risk minimization (Zhai, 2002)

September 11, 2002Language Modeling and Information Retrieval Workshop3 Example: Aspect Retrieval Query: What are current applications of robotics? Find as many different applications as possible. Example Aspects A 1 : spot-welding robotics A 2 : controlling inventory A 3 : pipe-laying robots A 4 : talking robot A 5 : robots for loading & unloading memory tapes A 6 : robot telephone operators A 7 : robot cranes … Aspect judgments A 1 A 2 A 3 …... A k d … 0 0 d … 0 0 d … 1 0 …. d k

September 11, 2002Language Modeling and Information Retrieval Workshop4 Aspect Models (Hofmann 1999, Blei, Ng and Jordan., 2001) Aspect 1Aspect  Dirichlet (for example) Generative: Inference: Given aspects and document, what is posterior for ? Learning: Given documents, what are the (ML) aspects? Studied recently in (Minka and Lafferty, 2002)

September 11, 2002Language Modeling and Information Retrieval Workshop5 Evaluation Measures What is the best measure?  Requires concrete specification of task Several natural measures are computationally intractable, even assuming aspects known (e.g., aspect coverage, aspect uniqueness) Defining aspects is difficult Maximum likelihood cannot be expected to capture “true” semantic relationships in aspects

Aspect Retrieval Baselines Aspect Precision Aspect Recall

September 11, 2002Language Modeling and Information Retrieval Workshop7 Challenges for IR Models Better task specification and data  e.g., TREC interactive data inadequate More advanced models  Fewer independence assumptions, greater structure Improved inference and learning algorithms  Accuracy and efficiency To handle user preferences, background knowledge  Loss function and priors/constraints Probabilistic language models have proven to be an effective way to reason about IR systems. We now need: