Carnegie Mellon Exact Maximum Likelihood Estimation for Word Mixtures Yi Zhang & Jamie Callan Carnegie Mellon University Wei Xu.

Slides:



Advertisements
Similar presentations
Information Retrieval and Organisation Chapter 12 Language Models for Information Retrieval Dell Zhang Birkbeck, University of London.
Advertisements

EMNLP, June 2001Ted Pedersen - EM Panel1 A Gentle Introduction to the EM Algorithm Ted Pedersen Department of Computer Science University of Minnesota.
Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Probabilistic Clustering-Projection Model for Discrete Data
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 143, Brown James Hays 02/22/11 Many slides from Derek Hoiem.
Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.
Mixture Language Models and EM Algorithm
Visual Recognition Tutorial
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Lecture 5: Learning models using EM
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Exploration & Exploitation in Adaptive Filtering Based on Bayesian Active Learning Yi Zhang, Jamie Callan Carnegie Mellon Univ. Wei Xu NEC Lab America.
Language Modeling Approaches for Information Retrieval Rong Jin.
Maximum Entropy Model LING 572 Fei Xia 02/08/07. Topics in LING 572 Easy: –kNN, Rocchio, DT, DL –Feature selection, binarization, system combination –Bagging.
The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
WEMAREC: Accurate and Scalable Recommendation through Weighted and Ensemble Matrix Approximation Chao Chen ⨳ , Dongsheng Li
Language Models Hongning Wang Two-stage smoothing [Zhai & Lafferty 02] c(w,d) |d| P(w|d) = +  p(w|C) ++ Stage-1 -Explain unseen words -Dirichlet.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
Chin-Yu Huang Department of Computer Science National Tsing Hua University Hsinchu, Taiwan Optimal Allocation of Testing-Resource Considering Cost, Reliability,
An Asymptotic Analysis of Generative, Discriminative, and Pseudolikelihood Estimators by Percy Liang and Michael Jordan (ICML 2008 ) Presented by Lihan.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.
Positional Relevance Model for Pseudo–Relevance Feedback Yuanhua Lv & ChengXiang Zhai Department of Computer Science, UIUC Presented by Bo Man 2014/11/18.
HMM - Part 2 The EM algorithm Continuous density HMM.
CS Statistical Machine learning Lecture 24
ICIP 2004, Singapore, October A Comparison of Continuous vs. Discrete Image Models for Probabilistic Image and Video Retrieval Arjen P. de Vries.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
MindReader: Querying databases through multiple examples Yoshiharu Ishikawa (Nara Institute of Science and Technology, Japan) Ravishankar Subramanya (Pittsburgh.
Advantages of Query Biased Summaries in Information Retrieval by A. Tombros and M. Sanderson Presenters: Omer Erdil Albayrak Bilge Koroglu.
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.
Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.
Jen-Tzung Chien, Meng-Sung Wu Minimum Rank Error Language Modeling.
5. Maximum Likelihood –II Prof. Yuille. Stat 231. Fall 2004.
CSE 517 Natural Language Processing Winter 2015
Relevance Language Modeling For Speech Recognition Kuan-Yu Chen and Berlin Chen National Taiwan Normal University, Taipei, Taiwan ICASSP /1/17.
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
Using Social Annotations to Improve Language Model for Information Retrieval Shengliang Xu, Shenghua Bao, Yong Yu Shanghai Jiao Tong University Yunbo Cao.
Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.
A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05
The Effect of Database Size Distribution on Resource Selection Algorithms Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University.
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/22/11.
Modeling Annotated Data (SIGIR 2003) David M. Blei, Michael I. Jordan Univ. of California, Berkeley Presented by ChengXiang Zhai, July 10, 2003.
Two Types of Empirical Likelihood Zheng, Yan Department of Biostatistics University of California, Los Angeles.
A Study of Poisson Query Generation Model for Information Retrieval
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Relevant Document Distribution Estimation Method for Resource Selection Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Federated text retrieval from uncooperative overlapped collections Milad Shokouhi, RMIT University, Melbourne, Australia Justin Zobel, RMIT University,
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
Online Multiscale Dynamic Topic Models
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009
Language Models for Information Retrieval
John Lafferty, Chengxiang Zhai School of Computer Science
Junghoo “John” Cho UCLA
CS590I: Information Retrieval
Presentation transcript:

Carnegie Mellon Exact Maximum Likelihood Estimation for Word Mixtures Yi Zhang & Jamie Callan Carnegie Mellon University Wei Xu NEC C&C Research Lab

Carnegie Mellon Outline u Introduction 1. Why this problems? some retrieval applications 2. Traditional solutions: EM algorithm u New algorithm: exact MLE estimation u Experimental Results

Carnegie Mellon Query Q Document D Results Feedback Docs F={d 1, d 2, …, d n } Example 1: Model-based Feedback in the Language Modeling Approach to IR Based on Zhai&Lafferty’s slides in CIKM 2001

Carnegie Mellon  F Estimation based on Generative Mixture Model w w F={d 1, …, d n } Maximum Likelihood P(w|  ) P(w| C) 1- P(source) Background words Topic words Based on Zhai&Lafferty’s slides in CIKM 2001 Given: F, P(w|c) and Find: MLE of 

Carnegie Mellon M T :  Topic M E :  general English M I :  new E T new Example 2: Model-based Approach for Novelty Detection in Adaptive Information Filtering Based on Zhang&Callan’s paper in SIGIR 2002 Given:  general English,  Topic E T new Find: MLE of  new

Carnegie Mellon Problem Setting and Traditional Solution Using EM u Observe: data generated by a mixture multinomial distribution r=(r 1, r 2, r 3, …, r k ) u Given: interpolation weights  and , another multinomial distribution p=(p 1, p 2, p 3, …, p k ) u Find: the maximum likelihood estimation (MLE) of multinomial distribution q=(q 1, q 2, q 3, …, q k ) u Traditional solution: EM algorithm l Iterative process which can be computationally expensive l Only provide approximate solution

Carnegie Mellon Finding q (1) Under the constraints: Where: f i is observed frequency of word i

Carnegie Mellon Finding q (2) For all the q i such that q i >0, apply Lagrange multiplier method and calculate the derivatives with respect to q i : This is a close form solution for q i, if we know all i that q i >0. Theorem: All the q i greater than 0 correspond to the smallest See detailed proof in our paper…

Carnegie Mellon Algorithm for Finding Exact MLE for q

Carnegie Mellon Experiments Setting on Model Based Feedback in IR u 20 relevant documents (sampled from AP Wire News and Wall Street Journal dataset from ) for a topic as observed training data sequence. p is calculated directly as described in (Zhai&Lafferty) from documents. u There are 2352 unique words in these 20 relevant documents, which means at most 2352 q i 's are none zero, while there are p i 's are none zero.

Carnegie Mellon EM result converges to the result calculated directly by our algorithm.

Carnegie Mellon Compar ing the Speed of Our Algorithm With EM u EM stop if change of LL < 10 -  u times on PIII 500 PC

Carnegie Mellon Conclusion u We developed a new training algorithm that provide exact MLE for word mixtures u Theoretically and Empirically works well u Can be used in several language model based IR applications